NASA-CR-196777 


//C/77 

^Ci2>o2 - — 


STEADY AND UNSTEADY THREE-DIMENSIONAL TRANSONIC FLOW 
COMPUTATIONS BY INTEGRAL EQUATION METHOD 


by 


Hong Hu, Principal Investigator 


Hampton University 
Hampton, Virginia 23668 


Final Technical Report 
of 

Grant NAG-1-1170 


<N 



00 



IT 


f\J 


l/> 

O 

«-» 

05 

m 

1 


m 

in 

u 

rvi 

o 

c 

o 

z 


o 


rsi 

O 


for the period of 
September 1990 - August 1994 


rn 

o 


m 

3 

< 

> — 

CO ro I 

o c a 
z 00-0 

< J Z U- ^ o 

< O o CO 

> Z O r-» ^ 
Q O H O 

< *~< < X * 

til i/l K a** 1 * 
h- Z O LU 01 t 

oo uj a. x oo > 


~ O O a 4-> D 
r- I *-< u 

uj :* k o c 

n in o < a o 

nO oc -j z> a> 4-> 

0 i a. ad a 

p-» k uj B 

1 O •— 3 

a: >- *-« _j <tJ X 

U O Z < U ^ 

i < o d 

< uj 00 O C 

in h Z UJ r ^ 

< 00 < *— U 0 s 

zzaz 0 ) ^ 

W O I— H H r-4 


Prepared for 

National Aeronautics and Space Administration 


STEADY AND UNSTEADY THREE-DIMENSIONAL TRANSONIC FLOW 
COMPUTATIONS BY INTEGRAL EQUATION METHOD 


by 

Hong Hu, Principal Investigator 


Hampton University 
Hampton, Virginia 23668 


Final Technical Report 
of 

Grant NAG-1-1170 


for the period of 
September 1990 - August 1994 


Prepared for 

National Aeronautics and Space Administration 



FOREWORD 


The research reported in this document was performed under the grant, NAG-1- 1170, from 
the National Aeronautics and Space Administration (NASA), Langley Research Center 
through HBCU (Historically Black College and University) Program. Dr. Carson Yates, 
Jr. was the technical monitor for the period from September 1990*to August 1992. Mr. 
Walter Silva was the technical monitor for the period from September 1992 to August 1994. 
The computational time on CM-2 and CM-5 was provided by Numerical Aerodynamic 
Simulation (NAS) Program at the NASA Ames Research Center. Dr. Carson Yates, 
Jr. and Mr. Walter Silva have provided helpful suggestions and guidance during this 
investigation. Two graduate students, Mr. Terry Logan and Mr. Min Soe, and three 
undergraduate students, Mr. Issac Jackson, Ms. Jada Paysour and Mr. Jason Bryan, at 
Hampton University, have participated and contributed to this work. 



CONTENTS 


Part I. Unsteady Flows 

Part II. Steady Flows with Shock Fitting 

Part III. MPP Implementation and Performance Analysis 

References 

Figures 

Tables 

Program Lists 
List of Publications 
Attachments - Publications 


Page 

2 

9 

16 

22 

24 

57 

60 


66 
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Hong Hu 

Hampton University 
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This is the final technical report of the research performed under the grant: NAG- 
1-1170, from the National Aeronautics and Space Administration. The report consists of 
three parts. The first part presents the work on unsteady flows around a zero-thickness 
wing. The second part presents the work on steady flows around non- zero thickness wings. 
The third part presents the massively parallel processing implementation and performance 
analysis of integral equation computations. In the end of the report, publications resulted 
from this grant are listed and attached. 
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PART Is UNSTEADY FLOWS 


SUMMARY 


This part presents the development of an unsteady integral equation (or called field- 
panel, field-boundary element) scheme for solving the full-potential equation for transonic 
unsteady zero- thickness wing flows. The unsteady full-potential equation is written in a 
moving frame of reference, in the form of the Poisson’s equation. Compressibility and 
unsteadiness are treated as non-homogeneity. The integral equation solution in terms 
of velocity field is obtained by the Green’s theorem. The solution consists of a wing 
surface integral term of vorticity distribution, a wake surface integral term of free-vortex 
sheet and a volume integral term of compressibility and unsteadiness over a small limited 
domain around the wing. Numerical solutions are obtained by a time-marching, iterative 
procedure. Time-derivative term is calculated by a second-order backward finite- difference 
scheme. To be consistent with the mixed-nature of flows, the Murman-Cole type-difference 
scheme is used to compute the derivatives of the density. The present scheme is applied to 
flows around a zero-thickness rectangular wing at transonic speed undergoing acceleration 
motion and transient pitching motion, respectively. The time history of wing surface 
pressure distributions is presented. 
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1-1. INTRODUCTION 


Starting in 1970, a great deal of progress has been made in solving transonic flows by 
using the finite-difference method (FDM) and finite-volume method (FVM). Although 
the Navier-Stokes equation formulation for the transonic flow computations has been 
understood as the best model and the FDM and FVM are successful in dealing with 
transonic flows, the computation of the unsteady Navier-Stokes equations over complex 
three-dimensional configurations is expensive, particularly for time-accurate unsteady flow 
computations. There are also major technical difficulties in FDM and FVM for generating 
suitable grids for complex three-dimensional aerodynamic configurations. 

The experience has shown that accurate solutions can been obtained for many tran- 
sonic flows using the inviscid modeling of the full- potential equation. For transonic flows 
without strong shocks and massive separations, the full-potential equation is an adequate 
approximation to the Navier-Stokes equations. The integral equation method (IEM) for 
the potential equation is an alternative to the FDM and FVM. Moreover, the IEM has 
several advantages over the FDM and FVM. The IEM involves evaluation of integrals, 
which is more accurate and simpler than the FDM and FVM, and hence a j2 c ?j r f e 
(field-panels) can been used in IEM. The IEM automatically satisfies the far-field bound 
ary conditions and therefore only a small limited computational domain is needed. The 
IEM does not suffer from the artificial viscosity effects as compared to FDM and FVM for 
shock capturing in transonic flow computations. The generation of the three-dimensional 
grid for complex configuration is not difficult in the IEM, since the mapping from physical 
plane to computational plane is not required. 

Because of these advantages associated with the IEM, it is highly desirable to fully de- 
velop the IEM to treat transonic flows. Integral equation methods for transonic flows have 
been developed by several investigators 1 ” 14 during the past few years for steady airfoil, 
wing and aircraft configurations and unsteady airfoils. The capability, accuracy and efn- 
ciency of the integral equation method for steady transonic flows have been investigated. 
The possibility of treating unsteady transonic flows has been investigated for an airfoil 
undergoing pitching oscillation 13 . The solutions show that the unsteady effects and the 
motion of the shock have been predicted accurately and efficiently by the method 13 . This 
part presents the unsteady three-dimensional integral equation scheme, which is the exten- 
sion of the unsteady two-dimensional scheme 13 , coupling with the steady three-dimensional 
scheme for zero- thickness wing 9 . Two numerical examples axe presented to demonstrate 
the capability of the unsteady wing flow computations using the integral equation method. 

1-2. FORMULATION 

For a general unsteady motion of a body, the governing equations are simple to solve 
if a moving (body-fixed) frame of reference formulation is used. This formulation does 
not require the grid-motion calculation since the grid is rigidly fixed in the frame. In 
addition to the space-fixed frame of reference OXYZ , a moving frame of reference oxyz 
is introduced as shown in Figure 1. The moving frame of reference oxyz is translating 

at a velocity of V Q (t) and rotating around a pivot point, r p = (x p ,y p , z p ), at an angular 
velocity of D(f). The relation for the absolute velocity (V), relative velocity (V r ) and 
transformation velocity (V a + V e = V 0 + fl X (r — r p )) is given by 

V = Vr + Vo + ti X (f-fp) (1) 
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where r is the position vector measured in the moving frame of reference. 

1-2.1 Governing Equations 

The non-dimensional unsteady full-potential equation in the moving frame of reference 



was derived as follows 14 : 

V 2 $ = G 


(2) 


with 

G = Gj + (?2 


(3) 


5* 

II 

1 


(4) 


r 1 &P 

° 2 p dt 


(5) 


and 

/.= {i + ^[-v r 2 + (' ? . + U) 2 


(6) 


where the characteristic parameters of the wing surface panel length, the density and 
the speed of sound at infinity have been used; $ is the absolute velocity potential (V = 
= v'$), Gi the compressibility, G 2 the unsteadiness, p the density, k the gas specific 
heat ratio, and t the time; and the prime (/) refers to the derivative with respect to the 
moving frame of reference. The associated boundary conditions are described in the next 
sub- section. 

1-2.2 Boundary Conditions 

The boundary conditions are surface no-penetration condition, Kutta condition, in- 
finity condition, wake kinematic and dynamic conditions. They are described as follows: 

V r ■ n g = 0 on g(r) = 0 (7) 


A C p \s P = 0 (8) 

V$ — ► 0 away from g{r) — 0 and w(r , t) = 0 (9) 

^- + V r -n w = 0 on w(ft) = 0 (10) 

|Vto| at 

AC P — 0 on w(r, t) = 0 (11) 


where n„ is the unit normal vector of the wing surface, g{f — 0; G„ is the surface pressure 
coefficient; the subscript sp refers to the edges of separation, and in^the present scheme 
the only separation from the wing trailing edge is considered; and w(r,t ) = 0 is the wake 
surface. 


1-2.3 IE Solution 
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By using the Green’s theorem, the integral equation solution of Eq. (2) in terms of 
the relative velocity field is given by 


V r (x, y, 2, t) = - V 0 (t ) - 6(f) x (r - f p ) 




47T 


J J Mi!hM2L *ds(i ,v,0 


-edd£dr}d£ 


( 12 ) 


where 7 is the surface vorticity distribution; the subscripts g and w refer to the wing and 
wake surfaces, respectively; ds is the infinitesimal surface area; J;he vector d is given by 
d = ( X - 0* + (y- v)! +( z ~ 0^; and Zd is defined by e d = d/\d\. 

In Eq. (12), the first integral term is the contribution of the wing surface vorticity; 
the second integral term is the contribution of the wake vorticity; and the third integral 
term is the contribution of the full compressibility and the unsteadiness It should be 
noticed that the infinity condition, Eq. (9), is automatically satisfied by the integral 
equation solution. It should be also noticed that the integrand of the volume integral 
term, the third integral term in Eq. (12), decreases rapidly with increasing distance, d, not 
only because of the factor of 1/d 2 but also G(£,i ?,(,<) diminishes rapidly with increasing 
distance. Consequently, for computational purposes, the volume integral term needs to be 
addressed only within the immediate vicinity of the body. This is the beauty of the IE 
methods. 


1-3. COMPUTATIONAL SCHEME 


1-3.1 Discretisation 


In terms of discretisation, the wing and its wake are represented by triangular vortex 
panels. A uniform rectangular parallelopiped type of volume elements are used throughout 
the flow field. The discretized integral equation solution becomes 


V r (x,y,z,t) 

= -V 0 (f)-6(f) x(f-r p ) 
LG NG 


+hiin 

47r i=l fc=l J 

lvmvnv r r r 1 

- Hi 

47 r ,= ij=ik=\ j j jVi * k 


( 13 ) 
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where the indices, i, j and k refer to the surface panels and field elements; LG x NG is the 
total number of wing surface panels; LW x NW is the total number of wake surface panels, 
and LV x MV x NV is the total number of field elements. A constant G-distribution is 
used over small field element, while a linear 7-distribution is used over small surface panel. 

1-3.2 Time- Marching, Iterative Scheme 

Due to the nature of the nonlinearity of the flow, the solutions are obtained through 
a time-marching, iterative procedure, where the compressibility, G\, unsteadiness, Cr 2 , 
and the wake shape and its strength are updated within each iteration. The solution 
procedure follows the successful form of the unsteady two-dimensional scheme and the 
steady three- dimension scheme 9 , hence only a brief description is given: 


Step 1 - at time step (k = 0 ) : ^ ^ 

This step corresponds to the steady flow computation. At this step, G 2 and ( d ^ / Ot) 
are set to be zero. Equations ( 3 ), ( 4 ), (6) and ( 13 ) with the boundary conditions are solved 
iteratively until the solution converges. Here, two loops are used. The inner loop is used 
to calculate and check the convergence of the non-linear term, G (o) . The outer loop is 
used to update and check the convergence of the wake shape and wing surface pressure 

distribution. 


Step 2 - at time step jjc = w) : . ,. , , 

This step is unsteady time marching step. At thisjstep, _the wing translation and angular 

velocities are calculated by the given functions, V 0 = V 0 (t) and D = respectively. 

The orientation of the wing changes due to the angular velocity. Two numerical examples 
are considered. In the first numerical example, the wing is given an acceleration motion. 
Translation Mach number, M 0 (t ) = V 0 (t)/ ct<x>i is given by 

M 0 (t) = Mi + M 0 t ( 14 ) 


where Mi is the initial value of M 0 (t) and M 0 is the rate of change of M 0 (t). In the second 
numerical example, the wing is given an forced transient pitching motion. Angle ot attack, 

a W , is given by a(t) = ai + & t ( 15 ) 


-♦ —* -+ r*. 

where a, is the initial value of o(t) and a is the 2-component of Cl (0 = Oi + 0 ] + ot '). 
The unsteadiness, G 2 ”\ is calculated numerically by a second-order accurate backward 
finite-difference scheme, which is given by 


G<"> = 


iav ( ”> 

> dt ’ 

1 ci^ n ~ 2) +c 2 p (n ~ 1) +c 3 p (n) 

~p( n ) c 4 


( 16 ) 


where c a = 1; c 2 = + &t^)/ c 3 = c 2 - 1; and c 4 = -(A x) + 

_ c 2 A^ n ^. The time derivative term of the potential, (d'<f?/dt) n , can been numer- 
ically calculated by $ (n) and and hence $ (n) and must be calculated by 
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integration of the velocity field numerically. In order to avoid numerical error when doing 
this numerical integration, Eq. (6) is used to compute (5 Q/dt'f' ^ distribution. Thus, Eq. 
(6) takes the form 


at « — 1 2 * 


(H) 


With Gi” ) obtained from Eq. (16) and (a'$/3() < " ) obtained from Eq. (17), Step 1 is 
repeated until the solution converges. 

Step 3 - at time step (fc = n + 1) : 

Step 2 is repeated for time step (n + 1). 

1-4. NUMERICAL EXAMPLES 


The present scheme has been applied to a zero-thickness, rectangular wing with aspect 
ratio of 2. The half-span of the wing and the wake is divided into 10 x 6 and 10 x 10 
quadrilateral panels, respectively. Each quadrilateral panel consists of 2 triangular panels. 
The one-half of the computational domain is divided into 23 x 9 x 9 field volume elements 
in x, y and z directions, respectively. Two numerical examples are presented as mentioned 
before. The first one is the acceleration motion and the second one is the pitching motion. 


Acceleration Motion 

In this numerical example, the wing is given an acceleration motion at an angle of 
attack of 5 degrees. The translation Mach number is given by 

M 0 (t) — 0.7 + O.li (18) 


where t is given by nAt\ n is the time step index and At is the time step size, which is nu 
merically chosen as 0.1 for this case. The time history of the surface pressure distributions 
is presented in Figures 2a through 2c at 2 = 1.11, z = 4.44 and z = 7.78, respectively. 
The surface pressure distributions at M„(t) = 0.80 over different wing across-sections are 
shown in Figure 2d. It can be seen that the shock is located in the region between x = 0.67 
and x = 1.67. The chord length is 10 units with the leading edge at x=0. Due to the Jack 
of experimental data and other computational results, an integral equation solution 9 for 
steady flow is provided in Figure 2a to serve as an reference solution. 


Transonic Pitching Motion 


In this numerical example, the wing is given an forced transient pitching motion at a 
transonic translation Mach number of 0.7. In this case, the contribution of the rotation 

of the moving frame of reference has been included, which is represented by ft x (r - r p ). 
To simplify the problem, only a pitching motion in ary-plane is considered. The angle of 
attack is given by 

om = 5° + 0.5°< (19) 


where At is numerically chosen as 1 in this case. The time history of the surface pressure 
distributions is presented in Figures 3a through 3c at 2 = 1.11, 2 = 4.44 and 2 = 7.78, 


7 



respectively. The surface pressure distributions at a(t) = 10° over different wing across- 
sections are shown in Figure 3d. The results presented here axe self-explanatory, which 
show that the present scheme can capture unsteady effects, although the accuracy of the 
solution is to be determined. 


1-5. CONCLUDING REMARKS 

An unsteady integral equation scheme based on the full-potential equation formulation 
for transonic flows is developed. The scheme is capable of handling general unsteady motion 
in three dimensions, although only an acceleration motion and a pitching motion m xy- 
plane are implemented and tested. The computational results show that the scheme is 
capable of capturing shock and unsteady effects. The scheme is very stable, and the solution 
converges within 1-5 iterations per time step. The number of iterations for convergence 
decreases when t increases. The large time steps (At = 1.0, or 0.1) used in the present 
computations make this scheme very efficient for unsteady flow solutions. It is necessary o 
emphasize that the main aim here has been a demonstration of unsteady three-dimensiona 
transonic flow computations using integral equation method, and hence the accuracy ot 
the solution is less of a priority at this time. 
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PART II: STEADY FLOWS WITH SHOCK-FITTING 


SUMMARY 


This part presents the development of an integral equation shock-fitting field-panel 
method for three-diemnsion (3D) transonic flows. In this method, the full-potential equa- 
tion, written in the form of the Poisson’s equation, is solved by integral equation field-panel 
method. The solution consists of a wing surface source panel integral term, a field-volume 
integral term of compressibility over a small limited domain, and a shock panel integral 
term. Due to the non-linearity of flows, solutions are obtained through an iterative proce- 
dure. Instead of using a field-panel refinement procedure, a shock-fitting technique is used 
to fit the shock. Finally, numerical examples are provided to demonstrate the accuracy of 
the method. The major differenc of this part from the previous part is that this part deals 
with non- zero thickness wing flows and the accuracy of the solution is discussed. 
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II- 1. INTRODUCTION 

As mentioned in the previous part, integral equation methods for transonic flows have 
been developed by several investigators 1 ' 14 during the past few years for steady airfoils, 
steady wing and aircraft configurations and unsteady airfoils. These methods solve either 
full-potential or transonic small disturbance equations using both surface and held panels 
The shock capturing technique was applied in these methods. The method of Kef. 11 
solves the full-potential equation for two-dimensional transonic flows, where both shock- 
capturing and shock-fitting techniques are applied. The capability of capturing shocks wit 
shock-capturing technique and improvement of the shock with shock-fitting technique was 
presented in the Ref. 11. The method is efficient and engineering accurate. In this part 
a method for computing steady 3-D flows is presented along with numerical examples 
to demonstrate the capability, accuracy and the potential of the present IE scheme for 
subsonic and transonic flow computations. The method is the extension of the steady 
2D method of Ref. 11 to three-dimensional flows. In order to use a coarse grid, which 
is particularly important in 3D calculations, the shock-fitting technique is applied to the 
present transonic flow calculations. 

II-2. FIELD-PANEL FORMULATION 
H-2.1 Governing Full-Potential Equation 

The non-dimensional steady full-potential equation is given by: 

V 2 $ = Gi ( 20 ) 


with 


G, = -^V 

P 


( 21 ) 


and 


r K ~ 1 

P = [ 1 + tM 1 


U 


— V 2 — VI 2 )] 


( 22 ) 


where the characteristic parameters, Poo , and c have been used; a is the speed of the 
sound, p the density, and c the wing root-chord length; and $ is the velocity potential 

(V$ = V = (u,v,U7)), Gi the compressibility, and k the gas specific heat ratio. 

Equation (20) is not in the conservative form but in the form of the Poisson’s equa- 
tion By writting the full-potential equation in the Poisson s form, the nonlinearity of 
the transonic flows can be treated as non-homogeneity and in terms of the IE solution 
this non-linearity is represented by field volume integral term And hence the classical 
surface- panel method can be extended into field-panel method for non-hnaer flows. The 

experience 


. 1-14 


IIVIUIOCI A. 

has shown that such non-conservative formulation has produce accurate 


solutions as long as the shock is not very strong. 

II-2.2 Boundary Conditions 

The general boundary conditions are surface no-penetration condition, Kutta condi- 
tion, infinity condition, and wake kinematic and dynamic conditions as mentioned m Part 
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I. For the present non-lifting flows, the only surface no-penetration condition and infinity 
condition are needed and they are given by: 


V ■ n g = 0 on g(r) = 0 


(23) 


and 


— ► 0 away from g{r) — 0 and w(r) — 0 


(24) 


where n g is the unit normal vector of the wing surface, g(f) = 0. 

II-2.3 IE Solution 

By using the Green’s theorem, the integral equation solution of Eq. (20) in terms of 
the velocity field is given by 


V(x,y,z ) = Voo 

+hllL GJ ^ lM 

+ hj Jj J ^r ^ Sjds ( ( ’ TI ' 0 


where is the free-stream velocity; q is the surface source distribution; the subscript, 
S, refers to the shock surface; ds is the infinitesimal surface area; the vector d is given by 
j*— ( x — £)t + (y — ri)j + (z — C)k; and ej is defined by e, j = d/\d\. It can be seen that the 
infinity condition, Eq. (24), is automatically satisfied by the integral equation solution, 
since the integrals become zero when d is large enough. 

II-2.4 Field-Panel Discretisation 

The formulation presented here can be easily extended to general lifting flows by 
including surface and wake vortex-panel integral terms, although the present computations 
are only made to symmetric non-lifting flows. In this non-lifting computational model the 
wing surface is represented by a number of uniform rectangular source panels. A uniform 
rectangular parallelopiped type of field- volume panels are also used throughout the flow 
field Constant surface and volume source ( q and G) distributions are assumed over wing 
/ shock surface panels and field volume panels. The discretized integral equation solution 


11 



in terms of surface and field-volume panels then becomes 


V(x,y,z) = V a 


LG NG 


?eVs(e ’’ ,,c) 

L V MV NV ret 1 

+ sEEE g ‘.m JJJ ffiWdvK 

i—l j = \ k=l 

MSNS r r 1 

+ sEE ■«>/£. ?*•*«• * c > 

j = lk=l 


( 26 ) 


where the indices, i, J and It refer to the surface and field panels; LG x NG is the total 
number of wing surface panels; LV x MV x NV is the total number of field panels; an 
MS x NS is the total number of shock surface panels. A sketch of the computational 
model is given in Figure 4, while the detailed wing surface panelling is given m Figure 5 
where the exact number of wing surface panels is shown for a typical case. 


II-3. COMPUTATIONAL SCHEME 


II-3.1 Iterative Scheme 

Due to the nature of the non-linearity of transonic flows, solutions are obtained 
through an iterative procedure, where the wing surface source strength and the compress- 
ibility over selected volume elements are updated through each iteration. The solution 
procedure follows the successful form of Ref. 11 of two-dimensional computations. Here 
only the treatment of shocks for transonic flow is described. 

II-3.2 Shock-Fitting Technique 

It should be mentioned that mathematically the second (volume) integral term of 
Eq (26) includes all compressibility effects including shock discontinuity. Since a relative 
coarse grid is used in the present IE computational domain where only 10 field panels are 
used over the wing chord, the contribution of the shock discontinuity is extracted from this 
volume integral term and it is represented explicitly by the third integral term of Eq. (26). 
It is very important to use coarse grid in 3D calculations, since the integral calculations 
over 3D field panels are very expensive. The strength of shock panels, q s , is equal to the 
difference of normal velocity across the shock. This can be shown by integrating Eq. (20) 
over an infinitesimal volume around an infinitesimal area of the shock surface and applying 
the divergence theorem, one gets 

A(V$) = Vin - V\ n = Gxe (27) 

where e is the infinitesimal thickness normal to the shock surface. By letting G\t = qs 
and using Rankine-Hugoniot relation, one finally obtains 

~ 11 Vl " (28) 
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where the subscripts 1 and 2 refer to the conditions ahead and behind of the shock, 
respectively; and the subscript n refers to the normal component to the shock. The purpose 
to use Rankine-Hugoniot relation is to introduce the effect of entropy change across the 
shock sinec the full- potential formulation uses isentropic flow assumption which is not 
true in the shock region. 


The constantly distributed, piece-wise continuous (in flow direction) oblique shock 
panels are used. The slope of shock panels is determined by the so called 9 - /3 - M 
relation as given by 

„ r M? sin 2 8—1 i / on \ 

to # = 2 co (/?[ Mf(7 + cos 2 /) j Ti ] ( 29 ) 

where 9 is the flow-deflection angle, and /? is the shock angle. 


In the present calculation, the shock panel term, the last term of Eq. (26), becomes 
active only after the sonic line (and hence the shock location) is fixed. In other words, the 
shock-capturing technique is first used to locate the shock, where the Murman-Cole type- 
difference scheme is used in consistent with the mixed-nature of transonic nows. The use of 
the Murman-Cole scheme is equivelent to the introducing of the artificial dissipation. The 
use of this artificial dissapation scheme within a shock-fitting scheme seems contradictory 
since some of their effects will cancel each other. But if we consider the shock-fitting as 
the way to give a correct inviscid shock and the Murman-Cole scheme as the way to give 
the artificial viscous effect, then the use of the Murman-Cole scheme with shock-fitting 
scheme will give a correct viscous shock, this is what it should be. 


II-4. NUMERICAL EXAMPLES 


The presesnt scheme is applied to rectangular wings of symmetric sections with differ- 
ent aspect ratios (AR) at different free-stream Mach numbers. The half-span of the wing 
surface (including upper and lower surfaces) is divided into (20to24) x(6tol2) quadrilat- 
eral panels depending on wing geometry and free-stream conditions. The one-halt of the 
computational domain is divided into 20 x 16 x (9tol8) field volume elements in chord, 
normal and span directions, respectively. The size of the computational domain is trom 
2c x 1.5c x 2.25c to 2c x 1.5c x 3c for different AR values in the chord, normal and span 
directions, respectively. It should be noted that the both surface- and field-panel sizes in 
chord (flow) direction are as large as 10% of chord length. Only symmetric flows with zero 
angle of attack (non-lifting flows) are considered, and attached flow assumption is also 
made. 

The first numerical example is made to the flow around a wing with a 5% thick 
circular arc section of AR — 3 at free-stream Mach number of 0.7, a shock-free subsonic 
flow, where the non-linearity effect is small. Figure 6a is the calculated surface local Mach 
contoures which shows that the flow is purly subsonic. The calculated surface pressure 
coefficients are presented in Figure 6b in terms of contours and Figure 6c in terms of line 
plot, along with the computational results obtained by the non-linear LTRAN3 lbD r D 
code 15 and by the linear SOUSSA IE code 16 at three span stations located at 0%, 50% 
and 90% of semi-span. As the figure shows, the presently calculated pressure distributions 
are in close agreement with the non-linear LTRAN3 results over the entire wing surface 
and agreement with the linear IE SOUSSA results except the discrepancy over leading and 
trailing edges. The convergence of the solution is obtained by checking the relative error of 
surface pressure distribution over each iteration, and for this shock-free flows, the number 
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of iterations for a convergent solution is 6. 

The second numerical example is made to a transonic flow around a wing with a 6% 
thick circular arc section of AR = 4 at a free-stream Mach number of 0.908. In order to 
show the capability of shock- fitting, the solutions obtained with and without shock-fitting 
are presented in Figure 7a through Figure 7d. Figures 7a and 7b are the surface Mach 
contours and surface pressure coeffcient contours without shock- fitting, respectively, where 
the shock is diffused but the supersonic flow region is clearly seen in the Figure 7a. Figures 
7c and 7d are the solutions with shock-fitting, where the shock is clearly predicted, lfie 
effect of shock-fitting is self-explanatory from these figures. In order to verify the accuracy 
of the shock -fitting, the calculated results are ploted in Figure 7e along with the other 
reference solution. The calculated pressure distributions compare very well with a lbL> 
FD result 17 and another IE result 18 except the discrepancy at the station near the wing 
tip. The location and the strength of the shock are correctly predicted by the present 
method. For the present transonic flow case, 16 iterations are used to get a convergent 
solution, where the first 10 iterations are used to locate shock and additional iterations are 

used to fit shock. 

The discrepancy near wing tip may be caused by the different tip shapes used in dif- 
ferent computational models, and hence to have different tip- release effects. To investigate 
this effect, the present computation is made for this case with different wing .tip thickness. 
Figures 8a through 8c are the results obtained by tapering off wing tip to 75%, 50 /o and 
0% of the value at root section, respectively. Figures 8a - 8c show the variation of the 
surface pressure coefficients at the station near tip due to the tip- release effect. 

While above two examples are for the flows around sharp leading-edge wings, the next 
two examples present the results for flows around round leading-edge wings. Figures 9a 
through 9c are the calculated results for the transonic flow around a wing with 6% thick 
NACA64A006 section of AR = 4 at a free-stream Mach number of 0.877. The calculated 
pressure at root section is compared with the 2-D experimantal data and the comparison 
shows a good agreement. 

The last numerical example is for the transonic flow around a wing with a symmet- 
ric 10 6% thick NACA64A010A section of AR = 4 at a free-stream Mach number of 0.8. 
The symmetric NACA64A010A wing section is obtained by averaging the upper and lower 
surface coordinates of NACA64A010A airfoil. The calculated pressure distribution is com- 
pared with the TSD finite-difference solution 20 as shown in Figure 10c, while Figures 10a 
and 10b are local Mach and surface pressure contours, respectively. The comparison shows 
that the two solutions have good agreement in terms of both location and strength of the 
shock. 


II-5. CONCLUDING REMARKS 

An integral equation field-panel method based on the full- potential equation formu- 
lation for transonic flows is presented. The method can be extended for handling flows 
around general three dimensional configurations, although only non-lifting cases are im- 
plemented and tested. The calculated wing surface pressure distribution is reasonably 
correct including the location and the strength of the shock. As an alternative to the 
grid refinement, the shock-fitting technique applied here does give a correct shock both in 
location and in strength. The present IEM is effective in terms of the number of iterations 
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compared with those of FDM and FVM, althought the computational cost per IE itera- 
tion is more expensive than those of FDM and FVM. The large grid size (for example, 
Ax = 0.1c Ay = 0.1c, Az = 0.25c) used here makes the scheme even more efficient. I he 
CPU time for a typical transonic flow case is around 165 seconds on Cray- YMP computer. 
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PART III: MPP IMPLEMENTATION AND PERFORMANCE ANALYSIS 


SUMMARY 


This part presents the massively parallel processing (MPP) implementation of inte- 
gral equation calculations and the performance analysis. For both two- dimensional (~D) 
and three-dimensional (3D) flows, integral equation panel method computer codes are 
converted into parallel CM-FORTRAN codes. Comparative study of computational per- 
formance of CM-2 / CM-5 and Cray-YMP computers is made. The performance results 
are obtained on CM-2 with 8k, 16k and 32k processors, and on CM-5 with 32, 64 and 
128 nodes along with those on Cray-YMP with a single processor. The comparison of the 
performance indicates that the parallel CM-FORTRAN code out-performs the equivalent 
serial FORTRAN code for most cases tested. 
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III-l. INTRODUCTION 


In recent years, the processors of conventional vector supercomputers seem to be 
approaching the limit in computational speed inherent in their technology. However, the 
need for even faster computations continues to grow. As a consequence, massively parallel 
computers are being developed as a possible solution. Massively parallel computers, such 
as CM-2 and CM-5, are families of parallel computer architectures which may provide 
orders of magnitude improvement in computation performance in the near future over 
today’s fastest conventional supercomputer. In fact the CM-5 computer with a maximum 
16k nodes installed is a 2 TFLOPS computer in theory. 


Computational fluid dynamics (CFD) is one of the areas which need super-fast com- 
putational power. The massively parallel computers has potential to become the main 
computational tool for CFD; it may replace the conventional supercomputers m the near 
future. The integral equation panel method has the nature for processing data in aparallel 
computing environment, and hence it is important to investigate this nature. 

III-2. CM-2/CM-5 AND CM-FORTRAN 


The Connection Machine CM-5 system is a scalable distributed-memory multiproces- 
sor system. The major hardware elements of the system include front-end computers to 
provide developing and execution environments and a parallel processing unit, which con- 
sists of multiple nodes, to execute parallel operations. It supports both the blMD ( ^ n S 1< r 
Instruction Multiple Data) data parallel and MIMD (Multiple Instruction Multiple Data) 
message passing programming models. The maximum possible configuration tor a CM-5 
system is 16Ar nodes, where k = 1024. The CM-5 used under the present study at . S 
Ames Research Center has 128 nodes installed. Each node has 32 MB of memory one 
SPARC processor and four vector processors for a theoretical peak performance ot 128 
MFLOPS. Therefore the CM-5 with 128 nodes has a theoretical peak performance ot lb 

GFLOPS. 

Similar to the CM-5, the CM-2 is another MPP machine which was built before the 
CM-5. The CM-2 supports SIMD data parallel computing mode. The parallel processing 
unit contains up to 64k single-bit physical processors. The CM-2 used under this study 
at NASA Ames Research Center has 32k processors. The aggregate peak performance tor 
this 32k CM-2 is about 2 GFLOPS. 

The CM-FORTRAN language is an implementation of FORTRAN 77 supplemented 
with array-processing extensions from the standard FORTRAN 90. These array-processing 
features map naturally onto the data parallel (for SIMD model of parallel programming) ar- 
chitecture of the CM-5 system, since the CM- FORTRAN I aUows Mray ^lements i to be eval- 
uated simultaneously. The most important difference of CM-FORTRAN from FORTRAN 
77 is the treatment of entire arrays as objects, thus explicit indexing m CM-t(JKlKAJN 
is not always necessary. For example, it is not necessary to write Do-Loops or other such 
control constructs to have the operation repeated for each element of arrays. On the other 
hand for message passing models of parallel programming (MIMD), the program may be 
written in FORTRAN 77 along with message passing routines. 


III-3. INTEGRAL EQUATION PANEL METHODS 


For incompressible flows, the governing equation is given by the Laplace equation 
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which can be obtained from Eq. (20) after setting the compressibility to zero, 


V 2 $ = 0 


( 30 ) 


where $ is the velocity potential, V = V$. 

The integral equation solution of Eq. (30) for source panel method, in terms of velocity 
field ( V = V$), is given by 


V(x,y,z ) = Voo 


4x 


LG NG 


££//. 


qiM^hO 

d? 


&dds(£, Tjf £) 


(31) 


where the subscript oo refers to the free-stream condition, LG x NG is the total number 
of panels; qij is the wing-surface source distribution, which is unknown to^be determined 
by applying boundary condition; ds is the infinitesimal surface area; the f ( i is defined by 
e* d = d/ljj, and where d = (x — £)i + (y — q)j + (z — C)fc. It should be mentioned that 
for two-dimensional flows, the surface integrals of Eq. (3l) become line ntegrals and the 
surface panels become line segments. 

The wing-surface zero-normal-velocity boundary condition is applied at each control 
point of all panels, 

V(x,y,z)-n = 0 on CP^k '■ I = \,NL\K = 1,NG (32) 


Applying Eq. (31) to Eq. (32), a system of equation is obtained, 

MM = {£} < 33 ) 

where [A] is N x N aerodynamic influence coefficient matrix, and N = LG x NG\ { 5 } is 
a N x 1 unknown vector matrix containing qj for j = 1 to AT; and {i?} is a .V x 1 known 

vector matrix which is contributed from Voo- 

The solution procedure of the problem using source panel method involves three major 
steps: ( 1 ) evaluation of integrals for N 2 times to construct matrices [A] and also [B]] (2) 
solving the resulting dense linear system of Eq. (4); and (3) post-processing of aerodynamic 
calculations. It should be noted that the Step (1) involves evaluating a large number of 
integrals. The total number of integrals can be very large for aerodynamic problems, for 
example, it can be in the order of 10 8 if LG x NG = 100 x 100 . An important feature of 
the Step (1) is that the calculation for each (x,y,z) and each (£,*?,<) can be performed 
simultaneously for all (x,y,z) and all (£,t?,C)- This feature of panel method calculation 
leads itself in a natural way for processing data in a SIMD parallel computing environment. 
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HI-4. PARALLEL FORTRAN IMPLEMENTATION 

The serial FORTRAN codes axe converted into parallel CM-FORTRAN codes where 
the CMAX translator is used to partially convert the 3D code. However the most of the 
conversion is done manually. 

List la is the program list of the subroutine for evaluating influence coefficient matrix, 
[Al, and the matrix {B} in serial FORTRAN. It is noted that this subroutine is nothing but 
a two-level Do-loops with a If- conditional statements, which provides for evaluating each 
element of [Al and {B}. When the code is in execution on Cray-YMP, the vectonzation 
of the inner Do-loops is automatically done through the vectonzation capability of the 
FORTRAN 77 compiler. 

List lb is the parallel CM-FORTRAM euqivalent of List la. A few things should 
be mentioned. First, no Do-loop is seen here since in CM- FORTRAN entire arrays are 
treated as obiects and array elements are evaluated simultaneously. Second, the conch- 
tionrd If^tatement is represented in the form of WHERE- ELSEWHERJE-ENDWHERE 
format which allows the conditional processing to be done m parallel, Third, the tM 
FORTRAN intrinsic function SPREAD is used here to create two-dimensional arrays from 
one-dimensional arrays by duplicating the elements in either row- or -column- -direct xons as 
desired for easy implementing parallel processing of statements like X If A , J ) — A AjU + 
X3(JV Fourth, temporary scale variables, such as DYJ, DXJ and so on in serial FOR- 
TRAN becomes two-dimensional arrays in CM-FORTRAN in order to implement parallel 
processing. However, such arrays increase the total memory requirement of CM- code sig- 
nificantly as compared with the serial FORTRAN code. For example, within the P re scn 
investigation it has been found that the CM-FORTRAN code with 4096 panels exceede 
the 2GB memory limit of the CM-2 computer with 16k processors. 

The dense linear system of Eq. (33) is solved by the direct Gauss elimination method 
for 2D and 3D cases and by indirect Jacobi’s iterative method for 2D case also. List 2 a 
is the program list of serial FORTRAN version of Jacobi’s iterative method while List 
2b is the parallel CM-FORTRAN version. From the list, it can be seen that the Jacobi s 
iteration method on CM-FORTRAN is fully parallelized. 

List 3a and List 3b are program list of the subroutine for evaluating aerodynamic 
influence coefficients for three-dimensional flows in serial FORTRAN and parallel CM- 
FORTRAN version, respectively. Similar to List la and List lb, the Do-loops m serial 
version are replaced by data parallel statement, and the If-conditional statement is replaced 
by parallel conditional statement. The major difference of 3D program from 2D program 
is that the 3D program is more complicated. 

III-5. PERFORMANCE STUDY 

Figure 11 shows the CPU time for solving linear system using Gauss elimination. On 
the CM-FORTRAN version, the system is solved by calling the Gauss elimination solver 
from the CMSSL library on the CM-2. The comparison shows that the CPU time required 
on CM-2 with 32k processors approaches to that required on Cray-YMP with the increase 
of the problem size. For example, the CPU time required on CM-2 ! with 32k processors .is 
0.125 seconds for N = 32 which is much larger than that for Cray-YMP of 0.00152 seconds 
while this comparison becomes 277 seconds to 192 seconds when N = 2048 Therefore it 
can be expected that when the problem size becomes large enough the CM-2 with 61 k 
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processors will near- or out-perform the Cray -YMP. But on the other hand, it is found 
that the direct method for large system of equations are very expensive for both serial and 
parallel versions. 

Table 1 gives the detailed performance results for 2D calculation on Cray- YMP and 
CM-5 computers where the dense linear system is solved by a much more efficient indirect 
Jacobi’s iterative method. The performance of CM-5 in terms of MFLOPS is the equiv- 
alent Cray- YMP performance. In Table 1, “Mat Coef” refers to evaluating aerodynamic 
enfluence coefficients; “Lin Syst” refers to solving linear system using Jacobi method; and 
“Total” refers to solving entire code. The sets of results from Table 1 have been extracted, 
and are presented in Figures 12-15. 


Figure 12 shows execution time for evaluating aerodynamic influence coefficient ma- 
trix, \Al and the matrix [ B ] on Cray-YMP and CM-5 computers for different numbers 
of panels. It can be seen that the CPU execution time decreases with the increase of the 
number of CM-5 nodes after the size of the problem is large enough to fully use all nodes. 
For example when N = 1024, the CPU time of 0.216 seconds with 32 nodes is reduced 
to 0.114 seconds with 64 nodes, and then is further reduced to 0-061 seconds with 128 
nodes. That is to say that whenever the number of nodes used is doubled, the CPU time 
is almost reduced by a factor of 2 - a near-perfect parallelization. It is also seen that when 
the problem size is large enough the CPU time required on CM-5, even with 32 nodes, is 
significantly (note that Log l0 -axis is used for execution time !) less than that required on 

Cray-YMP. 

Figure 13 is the CPU time for solving linear system using Jacobi iterations. The 
results tell us that, when the N is large enough, CM-5 out-performs Cray- YMP and the 
Jacobi method is very efficient. 


Figure 14 shows the total CPU time for solving entire problem. The results are self- 
explanatory. Figure 15 is a partial reproduction of Figure 14 for performance results on 
Cray-YMP and CM-5 with 128 nodes, and it is represented in terms of MFLOPb. from 
this figure it is clearly seen that the CM-5 performs at about 2 GFLOPS when N- 1024. 
The speed achieved here is very encouraging, which is much faster than that achieved on 

Cray-YMP. 

For three-dimensional flows, the performance results are listed in Table 2 through 
Table 5. Table 2 gives the detailed CPU time results on the Cray-YMP and the CM-5 to 
construct the matrices. It is founded that the CM-5 out-performs the Cray-YMP by small 
margin for N = 24 x 12. When the size is increased to N = 48 x 24 the CM-5 out-performs 
the Cray-YMP by a much larger margin, even with only 32 nodes. 


Table 3 displays CPU time for the Gaussian elimination routine, and it tells us that 
the Cray-YMP out-performs the CM-5 for N = 24 x 12. However its performance relative 
to the CM-5 decreases drastically when N increases; and in fact we can see that the CM-5 
out-performs the Cray-YMP for N = 48 x 24. 


Table 4 lists the CPU time for post processing calculations where the parallel CM- 
FORTRAN version is not fully parallelized. Table 5 is the total CPU times and from the 
table it is seen that for the larger probel tested under this study, the CM-5 out-performs 
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the Cray-YMP. 


III-6. CONCLUDING REMARKS 

Source panel method computational codes for both 2D and 3D incompressible hows 
axe successfully implemented on the MPP computers using CM- FORTRAN language. The 
linear system is solved by both direct Gauss elimination and efficient iterative Jacobi s 
methods. The detailed performance results are obtained and analysed. The parallel CM- 
FORTRAN code achieves a very high performance and for most of the cases tested here it 
out-performs Cray-YMP supercomputer, which is very encouraging. Through this study, 
it seems that the integral equation method is appropriate for parallel computation and a 
high performance can be achieved. 
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Figure 2b. Unsteady C p history at z = 4.44, acceleration motion. 





Figure 2d. C p distribution at M 0 (t) — 0.8 over wing surface, acceleration motion 







X 


Figure 3b. Unsteady C p history at 2 = 4.44, pitching motion 




3d. C p distribution at a(t) = 10° over wing surface, pitching motion. 




Figure 4. Computational model. 
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surface panels 

6% circular arc section rectangular wing, AR=4 
wing tip 



wing root 


Figure 5. Surface panels. 
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without shock-fitting 
6% circular arc rectangular wing 

AR=4, M=0.908, ALPHA=0 



Level M 

A 1.06 

9 1.04 

8 1.02 

7 1.00 

6 0.97 

5 0.95 

4 0.93 

3 0.91 

2 0.89 

1 0.87 


Figure 7a. Effect of shock-fitting, Mach contours, withou shock- fitting. 
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Figure 7c. Effect of shock-fitting, 









Figure 7e. Surface pressure coefficients, with shock-fitting. 
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.6 


6% CIRCULAR ARC SECTION, RECTANGULAR WING 
TIP THICKNESS = 75% OF ROOT THICKNESS 
AR=4, M=0.908, ALPHA=0 



Figure 8a. Tip-release effect, tip thickness = 75% root thickness. 
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6% CIRCULAR ARC SECTION, RECTANGULAR WING 
TIP THICKNESS = 50% OF ROOT THICKNESS 
AR=4, M=0.908, ALPHA=0 




0.00 0.25 0.50 0.75 1 .00 

Figure 8b. Tip- release effect, tip thickness = 50% root thickness. 
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6% CIRCULAR ARC SECTION, RECTANGULAR WING 
ZERO THICKNESS AT TIP 
AR=4, M =0.908, ALPHA=0 



Figure 8c. Tip-release effect, tip thickness = 0. 
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NACA64A006,AR=4,M=0.877,ALPHA=0 



Figure 9b. Transonic flow, surface pressure coefficient contours. 
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Figure 9c. Transonic flow, surface pressure coefficients. 
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NACA64A010A(S),AR=4,M=0.80,ALPHA=0 




Level M 
A 1.08 
9 1.03 

8 0.98 

7 0.93 

6 0.88 
5 0.83 

4 0.78 

3 0.73 

2 0.68 
1 0.63 


Figure 10a. Transonic flow, surface local Mach contours. 
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NACA64A010A(S),AR=4,M=0.80,ALPHA=0 



Figure 10b. Transonic flow, surface pressure coefficient contours. 
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Figure 10c. Transonic flow, surface pressure coefficients. 
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Figure 11. CPU time for solving linear system using Gauss elimination. 
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CPU time (sec.) 



Figure 12. CPU time for evaluating aerodynamic enfluence coefficients. 
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CPU time (sec.) 



Figure 13. CPU time for solving linear system using Jacobi method. 


54 


CPU time (sec.) 



Figure 14. CPU time for solving entire problem. 
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Performance (M FLOPS) 



Figure 15. Performance for solving entire problem. 
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Table 1. The detailed computational performance results 


Task/Size(N) 

Cray-YMP 

32-node CM5 

64-node CM 5 

128-node CM5 

Task 

N 

Time(s) 

klMIMlfe! 

Time(s) 

MFLOPS 

Time(s) 

MFLOPS 

Time(s) 

MFLOPS 

Mat Coef 

32 

0.0015 

144 

0.008 

27 


27 

0.008 

27 

Lin Syst 

32 

0.0004 

31 

0.012 

1 

0.011 

1 

0.011 

1 

Total 

32 


33 

0.027 

9 

0.027 

9 

0.027 

9 

Mat Coef 

64 

fSTigw 

169 

0.009 

96 

0.008 

108 

0.008 

108 

Lin Syst 

64 

0.0012 

34 

0.014 

3 

0.012 

3 

0.012 

3 

Total 

64 

0.0166 

56 

0.031 

30 

0.028 

33 

0.028 

33 

Mat Coef 

128 

0.0196 

177 

0.011 

Kola i 


385 

0.009 

385 

Lin Syst 

128 

0.0040 

33 


7 


9 

0.012 

11 

Total 

128 

0.0470 

79 


98 


116 

0.030 

124 

Mat Coef 

256 

0.0773 

181 




932 

0.011 

1272 

Lin Syst 

256 

0.0097 

31 

0.020 

15 

0.015 

20 

0.012 

25 

Total 

256 

0.1370 

107 : 

0.054 

271 

0.040 


0.034 

431 

Mat Coef 

512 

0.3050 

183 

0.061 

915 



0.022 

2537 

Lin Syst 

512 


31 

0.050 

23 


37 


63 

Total 

512 


131 

0.143 

408 




1060 

Mat Coef 

1024 


184 

0.216 

1039 

0.114 

1969 


3680 

Lin Syst 

1024 

0.1420 

31 

0.106 

42 

0.060 

73 

0.036 

122 

Total 

1024 

1.6100 

144 

0.391 

593 


1083 


1870 
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Table 2. CPU time in seconds for constructing matrices. 


si*e, N= 



Cray-YMP 

0.44 

8.25 


U. 24 

2.34 

64-node CM5 

0.17 

1.20 

1 28-node CM5 

0.12 

0.65 


Table 3. CPU time in seconds for Gaussian elimination. 


size, N= 

288(=24x1 2) 


Cray-YMP 

0.58 

33.85 


1.43 

857 

64-node CM5 

2.11 

7.05 

128-node CM5 

1.65 

7.66 
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Table 4. CPU time in seconds for post processing. 


size, N= 



Cray-YMP 

0.24 

4.30 

32 -noae umd 

2.18 

9.45 

64-node CM5 

2.21 

9.61 

1 28-node CM5 

2.18 

9.61 


Table 5. Total CPU time in seconds for incompressible flow. 


” size, N= 

288(=24x1 2) 

1152(=48x24) 

Cray-YMP 

1.33 

46.60 


3.86 

20 55 

64-node CM5 

4.53 

18.01 

M 

4.00 

18.08 
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SUBROUTINE MATELM 
PARAMETER (N= 3 2 , M=3 3 ) 

DIMENSION X (M) ,Y(M) ,XC(N) ,YC(N) ,DS(N) , FN(N,N) 
1 , FT (N , N) , RHS (N) , SDE (N) ,CI(N) ,SI(N) 

COMMON X , Y , XC , YC , DS , FN , FT , RHS , PI , CPI , Cl , SI 
1 , UINF , VINF , SDE 
DO 2 K=1 , N 
DO 1 J=1,N 

IF (K .EQ. J) FN (K, J) =2 . 0*PI 
IF (K. EQ. J) FT (K, J) =0 . 0 
IF (K. EQ. J) GOTO 1 
DYJ=SI ( J) *DS ( J) 

DXJ=CI (J) *DS ( J) 

SPH=DS ( J) *0.5 
XD=XC (K) -XC ( J) 

YD=YC (K) -YC ( J) 

RKJ=SQRT (XD*XD+YD*YD) 

BK J=ATAN 2 ( YD , XD ) 

ALJ=ATAN2 (DYJ, DXJ) 

GKJ=ALJ-BKJ 
ZIK=RKJ*COS (GKJ) 

ETK=-RKJ*SIN (GKJ) 

R1S= ( ( ZIK+SPH) **2) +ETK*ETK 
R2S=( (ZIK-SPH) **2) +ETK*ETK 
QT=ALOG (R1S/R2S) 

DEN=ZIK*ZIK+ETK*ETK-SPH*SPH 
GNM=ETK*DS ( J) 

QN=2 . 0* AT AN 2 (GNM, DEN) 

UKJ=QT*CI ( J) -QN*SI ( J) 

VKJ=QT*SI ( J) +QN*CI ( J) 

FN (K, J) =-UKJ*SI (K) +VKJ*CI (K) 

FT (K, J) =UKJ*CI (K) +VKJ*SI (K) 

1 CONTINUE 

RHS (K) =UINF*SI (K) -VINF*CI (K) 

2 CONTINUE 
RETURN 
END 


List la. Constructing matrices, 2D problem, in serial version. 
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SUBROUTINE MATELM 
PARAMETER (N=3 2 , M=3 3 ) 

DIMENSION X (M) ,Y(M) ,XC(N) ,YC(N) , DS (N) , FN (N , N) 

1 , FT ( N , N ) , RHS (N) ,SDE(N) ,CI(N) ,SI(N) 

2 , DYJ (N , N) , DX J ( N , N ) ,SPH(N,N) ,XD(N,N) , YD(N,N) 

3 , RKJ (N,N) , BKJ (N,N) ,ALJ(N,N) ,GKJ(N,N) , ZIK(N,N) 

4 , ETK (N, N) , R1S (N , N) ,R2S(N,N) ,QT(N,N) , DEN (N,N) 

5 , GNM(N,N) ,QN(N,N) ,UKJ(N,N) ,VKJ(N,N) 

6 , DS2 (N, N) , CI2 (N, N) ,SI2(N,N) 

7 , XC2 (N , N) , YC2 (N,N) ,XC3(N,N) ,YC3(N,N) 

8 , SI3 (N, N) , CI3 (N,N) 

LOGICAL MAIN_DIAG(N,N) 

COMMON X , Y , XC , YC , DS , FN , FT , RHS , PI , CPI , Cl , SI 

1,UINF,VINF,SDE 

XC2 = SPREAD (XC, DIM=1 , NCOPIES=N) 

YC2 = SPREAD (YC, DIM=1 , NCOPIES=N) 

XC3 = SPREAD (XC, DIM=2 , NCOPIES=N) 

YC3 = SPREAD (YC , DIM=2 , NCOPIES=N) 

SI2=SPREAD ( SI , DIM=1 , NCOPIES=N) 

CI2=SPREAD (Cl , DIM=1 , NCOPIES=N) 

SI 3=SPREAD ( SI , DIM=2 , NCOPIES=N) 

Cl 3 =SPREAD ( Cl , DIM=2 , NCOPIES=N) 

DS2=SPREAD (DS , DIM=1 , NCOPIES=N) 

MAIN_DIAG=DIAGONAL ( SPREAD ( . TRUE . , 1 , N) , . FALSE . ) 
WHERE (MAIN_DIAG) 

FN = 2.0 * PI 
FT - 0.0 
ELSEWHERE 

DYJ = SI2 * DS2 
DXJ = CI2 * DS2 
SPH = DS2 * 0.5 
XD = XC3 - XC2 
YD = YC3 - YC2 
RKJ=SQRT(XD*XD +YD*YD) 

BKJ=ATAN2 (YD, XD) 

ALJ=ATAN2 ( DYJ , DXJ ) 

GK J = ALJ - BKJ 
Z I K=RK J * COS ( GK J ) 

ETK=-RKJ*SIN (GKJ) 

R1S= ( ( ZIK+SPH) **2 ) + ETK*ETK 
R2S=( (ZIK-SPH) **2) + ETK*ETK 
QT=ALOG (R1S/R2S) 

DEN=ZIK*ZIK + ETK* ETK - SPH*SPH 

GNM=ETK*DS2 

QN=2 . 0* AT AN 2 (GNM, DEN) 

UKJ=QT*CI2-QN*SI2 

VKJ=QT*SI2+QN*CI2 

FN=-UKJ*SI3+VKJ*CI3 

FT=UKJ*CI3+VKJ*SI3 

ENDWHERE 

RHS=UINF*SI-VINF*CI 

RETURN 

END 

List lb. Constructing matrices, 2D problem, in parallel version. 
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SUBROUTINE JBINT (A, B) 

PARAMETER (N=128 , M=128 ) 



C SOLVE AX=B USING APPROXIMATE JACOBI ITERATIONS 

C SERIAL FORTRAN VERSION 



DIMENSION A(N,N) ,B(N) ,X(N,100) 

INTEGER VAR 
MAXITER=50 
TOL=0. 001 
AX0=0 . 0 
XMAXDIF=0 . 0 
DO 1000 I = 1 , N 
X(I,1) = 0.0 

1000 CONTINUE 

K = 1 

66 K= K+ 1 

DO 200 I = 1 , N 

DO 300 J=1 , N 
VAR - K-l 

IF (J.EQ.I) GOTO 300 
AX0=A(I, J) *X(J,VAR) + AXO 
300 CONTINUE 

X ( I , K) = 1/A(I,I)*(B(I) -AXO ) 

XDIF = ABS (X(I ,K) -X(I,K-1) ) 

IF (XDIF.GT.XMAXDIF) XMAXDIF=XDIF 
AX0=0 . 0 

200 CONTINUE 

IF (XMAXDIF.LT. TOL) THEN 
MAXK = K 
GOTO 99 
ENDIF 

XMAXDIF =0.0 

IF (K.LT. MAXITER) GOTO 66 

PRINT* , ' NOT CONVERGENT YET AFTER ITERATIONS MAXITER 
RETURN 

99 CONTINUE 

DO 400 I=1,N 
B ( I ) =X ( I , MAXK ) 

400 CONTINUE 

RETURN 
END 


List 2a. Jacobi method in serial version. 
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SUBROUTINE jbite 
PARAMETER (N=128 , M=128 ) 


SOLVE AX=B USING APPROXIMATE JACOBI ITERATIONS 
PARALLEL CM- FORTRAN VERSION 


DIMENSION A(N,N) ,B(N) ,X(N,100) ,ax0(N) ,C(N) 

INTEGER VAR 
REAL XDIF100 (n) 

COMMON / BLK 2 / A 
COMMON/ BLK3/B 
maxiter = 50 
tol = 0.001 
x( : , 1) = 0.0 
k = 1 

66 k = k + l 
var = k - 1 

FORALL (I=1:N) AXO (I) =DOTPRODUCT(A(I , : ) , X( : , VAR) ) -A ( I , I ) *X (I , VAR) 
forall (i=l:n) c(i)=a(i,i) 
x(:,k)=1.0/c * (b-axO) 

xdifl00(l:n) = abs(x(:,k) - x(:,k - 1)) 

XMAXDI F=MAXVAL ( XDI F 1 0 0 ) 

IF (xmaxdif .LT. tol) THEN 
maxk = k 
GOTO 99 
ENDIF 

xmaxdif = 0.0 

IF (k .LT. maxiter) GOTO 66 

PRINT *, 'NOT CONVERGENT YET AFTER ITERATIONS maxiter 
RETURN 
99 CONTINUE 

b = x( : ,maxk) 

RETURN 

END 


List 2b. Jacobi method in parallel version. 
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SUBROUTINE VELWING ( IVELCT , IWG , IG , JG , KG) 
COMMON / BLKO 1/X(25,13) ,Y(25,13) ,Z(25,13) 


DO 1 JS=1 , NC 
JS1=JS+1 
DO 2 IS=1 , NR 
IS1=IS+1 
X1=X(IS, JS) 


XC= (X1+X2+X3+X4 ) /4.0 


DO 11 JR=1 , NC/2 

JR1=JR+1 

DO 12 IR=1 , NR 


XF=0 . 25* (X ( IR, JR1) +X (IR1 , JR) +X (IR f JR) +X ( IR1 , JR1) ) 


DX=XF-XC 


DIST=SQRT (DX*DX+DY*DY+DZ*DZ) 

IF(DIST.LT.FARFD) THEN 
IF(DIST.LT. 0.0001) THEN 
UC=0 . 5*UNX (IS, JS) 

VC=0. 5*UNY (IS, JS) 

WC=0 . 5*UNZ (IS, JS) 

ELSE 

CALL VWS ( XI , X2 , Z1 , Z 3 , Y1 , XF , YF , ZF , IS , JS , UC , VC , WC ) 
END IF 
ELSE 

AREAXZ=ABS ( (X2-X1) * (Z3-Z1) ) 

XYN=UNX(IS, JS) /UNY(IS, JS) 

ZYN=UNZ (IS, JS) /UNY(IS, JS) 

FACXZS=SQRT ( 1 . 0+XYN*XYN+ZYN*ZYN) 

AREAS=FACX Z S * AREAX Z 

CONSTFF=OPI4 *AREAS / (DIST*DIST*DIST) 

UC=CONSTFF*DX 
VC=CONSTFF*DY 
WC=CONSTFF*DZ 
END IF 


NBA= ( JS-1) *NR+IS 

A ( KEQ , NBA) =A (KEQ , NBA) - (UC*UNX ( IR, JR) 

+ +VC*UNY ( IR , JR) 

+ +WC*UNZ (IR, JR) ) 

12 CONTINUE 
11 CONTINUE 
2 CONTINUE 
1 CONTINUE 
RETURN 
END 


List 3a. Constructing matrices, 3D problem, in serial version. 
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SUBROUTINE velwing (ivelct , iwg, ig, jg,kg) 
include ' /usr/cm/ include/cm/CMF_def s . h' 

COMMON / BLKO 1/X(25,13) , Y (25 , 13 ) , Z (25 , 13 ) 

real unxm3 (24 , 12 , 24) ,vnxm3 (24, 12,24) , wnxm3 (24 , 12 , 24 ) 

unxm3 = spread (unx( :nr, :nc) ,dim=3,ncopies=24) 

xcm3 = spread (xcm,dim=3,ncopies=24) 


xcm4 =spread ( xcm3 , dim=4 , ncopies=6 ) 


xfm3 = spread (xfm,dim=l,ncopies=12) 


xfm4 = spread (xfm3 ,dim=l,ncopies=24) 
dxm4 = xfm4 -xcm4 


distm4 = sqrt (dxm4*dxm4+dym4*dyin4+dzin4*dzm4) 

where (distm4 • It. farfd) 

where (distm4 .LT. 0.0001) 
ucm4 =0.5* unxm4 


elsewhere 

xynm4 = unxm4/vnxm4 
zynm4 = wnxm4/vnxm4 

facxzsm4 = sqrt (1.0 + xynm4 * xynm4 + zynm4 * zynm4) 
ddxm4 = ddxm4 
ddzm4 = ddzm4 

fac4 = opi4*facxzsm4*abs(ddxm4*ddzm4) /6 . 0 

vwx4 = f fxll+f fx21+f fx31+f fx41+ 

& + f fx44+f fxl5+f fx25+f fx35+f fx45 


ucm4 = vwx4*fac4 


endwhere 

elsewhere 

zynm4 = wnxm4/vnxm4 

facxzsm4 = sqrt (l+xynm4*xynm4 + zynm4*zynm4) 
contm4 = opi4*areasm4/ (distm4*distm4*distm4) 
vcm4 = contm4*dym4 
endwhere 

forall ( jr=l:nc/2,ir=l:nr, js=l:nc,is=l:nr) 

& a(ir+( jr-1) *nr, is+( js-1) *nr) = a(ir+( jr-1) *nr, is+( js-l) *nr) 
& - ( (ucm4 (is, js, ir, jr) 

& *unxd4(is, js,ir, jr) )+ 

& (vcm4 (is, js,ir, jr) *vnxd4 (is, js,ir, jr) )+(wcm4(is, js, ir, jr) 

& *wnxd4 (is, js, ir, jr) ) ) 
return 
end 


List 3b. Constructing matrices, 3D problem, in parallel version. 
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Development of A Shock-Fitting Field-Panel Method for 3D Transonic Flows 

Hong Hu 

Hampton U niversity 
Department ol Mathematics 
Hampton, Virginia 23GGS, USA 

ABSTRACT 

The paper presents the development of a shock-fitting field-panel method foi tlnee- 
diemnsion (3D) transonic flows. In this method, the full-potential equation, written in 
the form of the Poisson’s equation, is solved by integral equation field- panel method. 
The solution consists of a wing surface source panel integral term, a field-volume panel 
integral term of compressibility over a small limited domain, and a shock panel integial 
term. Due to the non-linearity of flows, solutions are obtained through an iterative 
procedure. Instead of using a field-panel refinement procedure, a shock-fitting technique 
is used to fit the shock. Finally, numerical examples are provided to demonstrate the 

accuracy of the method. 
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1. INTRODUCTION 


The finite- difference method (FDM) and finite-volume method (FVM) for solving 
transonic flows have been well developed during the past twenty years. Although the 
Navier-Stokes equation formulation for the transonic flow computations has been under- 
stood as the best model and the FDM and FVM are successful in dealing with transonic 
flows, the computation of the unsteady Navier-Stokes equations over complex three- 
dimensional configurations is very expensive, particularly for time-accurated unsteady 
flow computations. There are also major technical difficulties in FDM and FVM for 
generating suitable grids for complex three-dimensional aerodynamic configurations. 

The experience has shown that rather accurate solutions can been obtained for cer- 
tain transonic flows using the inviscid modeling of the full-potential equation. For tian- 
sonic flows without strong shocks and without massive separations, the full-potential 
equation is an adequate approximation to the Navier-Stokes equations. The integial 
equation method (IBM) for the potential equation is an alternative to the FDM and 
FVM. Moreover, the IEM has several advantages over the FDM and FVM. The 1EM 
involues evaluation of integrals, which is more accurate than the FDM and FVM, and 
hence a coarse grid can been used in IEM. The IEM automatically satisfies the lar- field 
boundary conditions and therefore only a small limited computational domain is ne< ded. 
The generation of the three-dimensional grid for complex configuration is not difficulty m 
the IEM, since the mapping from physical plane to computational plane is not required. 

During tlu; past few yeitrs integral equation methods for transonic flows have boon 
developed by Piers and Sloof (1979), Tseng (1984), Erickson and Strande (1985), Kandil 
and Hu (1988) and Ogana (1989) for steady airfoils, by Tseng (1984), Kandil and Yates 
(1986), Madson (1987) and Sinclair (1988) for steady wing and aircraft configurations, 
and by Hounjet (19S1) and Kandil and Hu (1990) for unsteady airfoils by solving either 
full-potential or transonic small disturbance equations using both surface and field panels. 
The shock capturing technique was applied in these methods. 1 he method of Kandil 



and Hu (1988) solves the full-potential equation for two-dimensional transonic flows, 
where both shock-capturing and shock-fitting techniques are applied. The capability of 
capturing shocks with shock-capturing technique and improvement of the shock with 
shock-fitting technique was presented by Kandil and Hu (1988). The method is efficient 
and engineering accurate. In the present paper a method foi computing steady 3 D flows 
is presented along with numerical examples to demonstrate the capability, accuracy and 
the potential of the present IE scheme for subsonic and transonic flow computations. The 
method is the extension of the steady 2D method of Kandil and Hu (108S) and is being 
extented to unsteady 3D transonic flow computations. In fact, the present method has 
been applied to unsteady transonic flows around a zero-thickness wing by Hu (1992, 1993) 
using vortex panel method. In order to use a coarse grid, which is particularly important 
in 3D calculations, the shock-fitting technique is applied to the- present transonic flow 

calculations. 

2. FIELD-PANEL FORMULATION 
2.1 Governing Full-Potential Equation 

The non-dimensional steady full-potential equation is given by: 

0H> dH ( 1 ) 

dx* + Oy- + 0z l 


with 


1 Op dp Op 

G = (w-X t- u T 1- ) 

p Ox Oy Oz 


and 


[1 + 


^ ( 1 -u a -i> a -u> a )]= i T 


( 2 ) 


(3) 


where the characteristic parameters, Pvn «oo and c have been used; a is the speed of the 
sound, p the density, and c the wing mot-chord length; and <I> is the velocity potential 
(V$ — V — (?/, v,w)), G the compressibility, and k the gas specific heat ratio. 
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Equation (1) is not in the conservative form but in the form of the Poisson’s equa- 
tion. By writting the full-potential equation in the Poisson s form, the nonlinearity of 
the transonic flows can be treated as non- homogeneity and in terms of the IE solution, 
this non-linearity is represented by field volume integral term. And hence the classi- 
cal surface- panel method can be extended into field-panel method for non-linaer hows. 
The experience has shown that such non-conservative formulation has produce accuiate 
solutions as long as the shock is not very strong. 


2.2 Boundary Conditions 

The general boundary conditions are surface no-penetration condition, Kutta con- 
dition, infinity condition, and wake kinematic and dynamic conditions. Foi the pxesent 
non-lifting flows, the only surface no-penetration condition and infinity condition are 
needed and they are given by: 

V ■ fig = 0 on g(r) = 0 (4) 

and 

V4* — ♦ 0 away from g(r) = 0 and w(r) = 0 (5) 

where fl tJ is the unit normal vector of the wing surface, <j(r) = 0. 

2.3 IE Solution 


By using the Green’s theorem, the integral equation solution of Eq. (1) in terms of 
the velocity field is given by 


P(x,y,c) = V 0 


hSSr ^ 61 ^’ 1 ' 0 

rj J i 

h!L 


(G) 


d* 


7.s((,'/,C) 

d i 


Ct/ds(£, //, C) 
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where Voo is the free-stream velocity; q is the surface source distribution; the subscript, 
5, refers to the shock surface; ds is the infinitesimal surface area; the vector d is given by 
d - (x - 0* + (y ~ *?)/ + (z ~ Qk] and e d is defined by t d = d/\d\. It can be seen that the 
infinity condition, Eq. (5), is automatically satisfied by the integral equation solution, 
since the integrals become zero when d is large enough. 

2.4 Field-Panel Discretisation 

The formulation presented here can be easily extended to general lifting flows by in- 
cluding surface and wake vortex-panel integral terms, although the present computations 
are only made to symmetric non-lifting flows. In this non-lifting computational model, 
the wing surface is represented by a number of uniform rectangular source panels. A uni- 
form rectangular parallelopiped type of field- volume panels are also used throughout the 
flow field. Constant surface and volume source (q and G) distributions are assumed over 
wing / shock surface panels and field volume panels. The discretized integial equation 
solution in terms of surface and field-volume panels then becomes 


= Vc 


LG NG 


i _ ' r r i 

i=u= i J J,j ' k 
LVMVNV f r r 1 

MS NS . . j 

+ j ;EL«W/ S 

j = lk=i J J ^- k 


( 7 ) 


where the indices, i, j and k refer to the surface and field panels; LG X NG is the total 
number of wing surface panels; LV x A IV x NV is the total number of field panels; and 
MS x NS is the total number of shock surface panels. A sketch of the computational 
model is given in Figure 1, while the detailed wing surface’ panelling is given in Figure; 2 



where the exact number of wing surface panels is shown. 


3. COMPUTATIONAL SCHEME 


3,1 Iterative Scheme 


Due to the nature of the non-linearity of transonic flows, solutions are obtained 
through an iterative procedure, where the wing surface source strength and the com- 
pressibility over selected volume elements are updated through each iteration. The solu- 
tion procedure follows the successful form of Kandil and Hu (198S) of two-dimensional 
computations. Here only the treatment of shocks for transonic flow is described. 

3.2 Shock-Fitting Technique 


It should be mentioned that mathematically the second (volume) integral term of 
Eq. (7) includes all compressibility eifects including shock discontinuity. Since a relative 
coarse grid is used in the present IE computational domain where only 10 field panels are 
used over the wing chord, the contribution of the shock discontinuity is extracted from this 
volume integral term and it is represented explicitly by the third integral term of Eq. (7). 
It is very important to use coarse grid in 3D calculations, since the integral calculations 
over 3D field panels are very expensive. The strength of shock panels, </<,-, is equal to the 
difference of normal velocity across the shock. This can be shown by integrating Eq. 
(1) over an infinitesimal volume around an infinitesimal area of the shock surface and 
applying the divergence theorem, one gets 

A(V$) = V 2n - - Ge (8) 


where 6 is the infinitesimal thickness normal to the shock surface. By letting Ge = 
and using Rankine-Hugoniot relation, one finally obtains 


<ls = . 


(k - + 2 

(k + 1 )M'f n 


-i]v ln 


( 9 ) 


where the subscripts 1 and 2 refer to the conditions ahead and behind of the shock, 
respectively; and the subscript n refers to the normal component to the shock. The 
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purpose to use Rankine-Hugoniot relation is to introduce the effect of entiopy change 
across the shock sinec the full- potential formulation uses isentiopic flow assumption 


which is not true in the shock region. 

The constantly distributed, piece-wise continuous (in flow direction) oblique shock 
panels are used. The slope of shock panels is determined by the so called 9 /? M 

relation as given by 


ianO = 2 cotfi [ 


Mfxin 2 ft — 1 1 

Mf ( 7 + cos2p) + 2l 


where 0 is the flow- deflection angle, and \i is the shock angle. 


( 10 ) 


In the present calculation, the shock panel term, the last term of Lq. (7), becomes 
active only after the sonic line (and hence the shock location) is fixed. In other words, 
the shock-capturing technique is first used to locate the shock, where the Munnun-Cole 
type- difference scheme is used in consistent with the mixed-nature of tiansonic flows. 
The use of the Murman-Cole scheme is equivelent to the introducing of the artificial 


dissipation. The use of this artificial dissapation scheme within a shock-fitting scheme 
seems contradictory since some of their effects will cancel each otlici. But if we considci 
the shock-fitting as the' way to give a correct invisciel shock and the Muiman-Cole scheme 
as the way to give the artificial viscous effect, them the use of the Murman-Cole scheme 
with shock-fitting scheme will give a correct viscous shock, this is what it should be. 


4. NUMERICAL EXAMPLES 


The presesnt scheme is applied to rectangular wings of symmetric circular arc sections 
with different aspect ratios (AR) at different Iree-stream Mach numbers, one for shock- 
free subsonic flow and one for transonic flow with a moderate shock. The half-span of 
the wing surface (including upper and lower surfaces) is divided into 20 x G quadrilateral 
panels as shown in Figure 2. The onc-half of the computational domain is divided into 
20 x 1 G x 9 field volume elements m chord, normal and span directions, respectively. The 
size of the computational domain is 2c x 1.5c x 2.25c and 2c x 1.5c X 3c for two AR values 
in the chord, normal and span directions, respectively. It should be noted that the both 
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surface- and field- panel sizes in chord (flow) direction are as large as 10% of chord length. 

The first numerical example is made to the flow around a wing with a 5% thick 
circular arc section of AR = 3 at free-stream Mach number of 0.7, a shock-free subsonic 
flow, where the non-linearity effect is small. Figure 3a is the calculated surface load Mach 
contoures which shows that the flow is purly subsonic. The calculated surface pressure 
coefficients are presented in Figure 3b in terms of contours and Figure 3c in terms of line 
plot, along with the computational results obtained by the non-linear LTRAN3 TSD FD 
code of Guruswang and Goorjian (1982) and by the linear SOUSSA IE code of \ates, 
Cunningham, Desmarais, Silva and Drobenko (1982) at three span stations located at 
0%, 50% and 90% of semi-span. As the figure shows, the presently calculated pressure 
distributions are in close agreement with the non-linear LTRAN3 results over the entire 
wing surface and agreement with the linear IE SOUSSA results except the discrepancy 
over leading and trailing edges. The convergence of the solution is obtained by checking 
the relative error of surface pressure distribution over each iteration, and for this shock- 
free flows, the number of iterations for a convergent solution is G. 

The second numerical example is made to a transonic flow around a wing with a G% 
thick circular arc section of AR = 4 at a free*- stream Mach number of 0.908. In order to 
show the capability of shock- fitting, the solutions obtained with and without shock-fitting 
are presented in Figure 4a through Figure 4d. Figures 4a and 4b are the surface Mach 
contours and surface pressure coeffcient contours without shock-fitting, where the shock 
is diffused but the supersonic flow region is clearly seen in the Figure 4a. Figures 4c and 
4d are the solutions with shock-fitting, whore the shock is clearly predicted. The effect of 
shock-fitting is self-explanatory from these figures. In order to verify the accuracy of the 
shock-fitting, the calculated results are ploted in Figure 4e along with the other reference 
solution. The calculated pressure distributions compare very well with a TSD FD result 
of Bailey and Steger (1972) and another IE result of Tseng (1984) except the discrepancy 
at the station near the wing tip. The location and the strength of the shock are correctly 
predicted by the present method. For the present transonic flow case, 1G iterations are 
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used to get a convergent solution, where the first 10 iterations are used to locate shock 
and additional iterations are used to fit shock. 

The discrepancy near wing tip may be caused by the different tip shapes used in 
different computational models, and hence to have different tip- release effects. To inves- 
tigate this effect, the present computation is made for this case with different wing tip 
thickness. Figures 5a through 5c are the results obtained by tapering off wing tip to 75%, 
50% and 0% of the value at root section, respectively. Figures 5a - 5c show the variation 
of the surface pressure coefficients at the station near tip due to the tip- release effect. 

5. CONCLUDING REMARKS 

An integral equation field-panel method based on the full-potential equation for- 
mulation for transonic flows is presented. The method can be extended foi handling 
flows around general three dimensional configurations, although only non-lifting cases 
are tested. The calculated wing surface pressure distribution is reasonably correct in- 
cluding the location and the strength of the shock. As an alternative to the grid refine- 
ment, the shock-fitting technique applied here does give a correct shock both in loca- 
tion and in strength. The present IEM is effective in terms of the number of iterations 
compared with those of FDM and FVM, althought the computational cost pci IE iteia 
tion is more expensive than those of FDM and FVM. The large grid size (foi example, 
Ax = 0.1c, Ay = 0.1c, Az = 0.25c) used here makes the scheme even more efficient. The 
CPU time for a typical transonic flow case is around 1G5 seconds on Cray-YMP computer. 
Currently the lifting as well as unsteady effects are being added into the computation. 
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Figure 1. Computational model. 
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Figure 3. Shock-free subsonic flow results. 
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Figure 4. Effect of shock-fitting. 
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4b. Surface pressure coefficient contours without shock-fitting. 
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ABSTRACT 

The study of computational performance of a two-dimensional source panel method 
code on the massively parallel computer, CM-5, is made. A serial FORTRAN code running 
on Cray-YMP supercomputer is converted into a parallel CM-FORTRAN code for running 
on CM-5. Detailed performance results are obtained for CM-5 with 32 nodes, G4 nodes and 
128 nodes and for Cray-YMP with a single processor. The comparison of the performance 
indicates that CM-5 out performs Cray-YMP by a factor of 13 for the largest problem 
tested and achieves a speed of about 2 GFLOPS. 


1. INTRODUCTION 

Computer with massively parallel processors (MPPs) may provide orders of mag- 
nitude improvement in computational performance in a near future over today s fastest 
conventional vector supercomputer. The MPP computers employ a large number of small 
processors, which are much less expensive to produce than the vector supercomputer pro- 
cessors, and connect them together such that the computations can be done in parallel to 
achieve extrem high performance. Computational fluid dynamics (CFD) is one of the areas 
which need super- fast computational power. The integral equation panel method is the 
one of the CFD methods, which seems appropriate for parallel processing. The first author 


1 


and his co-worker 1 did the performance study of an source panel method computational 
code on CM-2, a MPP computer. In that work 1 , the resulting linear system was solved by 
a direct method, which was found to be inefficient and hence expensive for large system. 
This short paper presents the recent work based on Ref. [1], and here the resulting lineal- 
system is solved by a much more efficient iterative method. 


2. ABOUT CM-5 

The Connection Machine CM-5 system is a scalable distributed-memory multipro- 
cessor system. The major hardware elements include front-end computer to provide de- 
veloping and execution environments and a parallel processing unit to execute parallel 
operations. The system support both the SIMD (Single Instruction Multiple Data) data 
parallel and MIMD (Multiple Instruction Multiple Data) message passing programming 
models. The maximun possible configuration for the system is lGJfc nodes, where k - 1024. 
Each node has one SPARC processor and four vector processors for a therotieal peak per- 
formance of 128 MFLOPS. Therefore the CM-5 with maximun IGA: nodes installed would 
be a 2 TFLOPS machine theoretically. 


3. CFD-PANEL METHOD 

The physical problems considered here are potential flows around any non-lifting two- 
dimensional configurations. The governing equation to this type of pioblem is given by 
the Laplace equation, 


V 2 <I> = 0 


( 1 ) 


where $ is the velocity potential, V = V4*. This type of problems can be solved by 
integral equation source panel methods (or called boundary element method). The integral 


9 



( 2 ) 


equation solution in terms of velocity field is given by 


V(x,y) = Voo 


2w 


N 

E/ 




(£ 

(* 


- 0**+ (y - q)i 

- 0 2 + (y - ? /) 2 


ds(t,rj) 


where q\s are surface source distributions, which are unknowns to be determined by apply- 
ing boundary conditions; the subscripts g refers to the body surfaces; and N is the total 
number of surface panels. 

It should be noted that Eq. (2) involves evaluating a large number of integrals over 
body surface if the value of N is large. The total number of panels can be very large for 
three-dimensional aircraft configurations, and it can be, for example, in the order of 10 . 
An important feature of Eq. (2) is that the calculation for each (x,y) and each (£,»/) can 
be performed simultaneously for all (x,y) and all (£,»/) with a single instruction. This 
feature of panel method calculation leads itself in a natural way for processing data in a 


SIMD parallel computing environment. 

By applying body surface zero-normal- velocity condition at each (x,y) = (xj,ijj) foi 
j = 1 to iV, a TV X N linear system of equations is obtained as 


[A)M = [J3] 


(3) 


where [.4] is N x N aerodynamic influence coefficient matrix; [</] is a N x 1 unknown vector 
matrix containing q t for i = 1 to N) and [D] is a A X 1 known vector matrix which is 

contributed from the free-streain velocity, V^o. 

The solution procedure for this problem using panel method involves four steps: (a) 
generating body geometry information; (b) evaluating integrals of Eq. (2) for i = 1 to N 
and for (x ,y) = ( Xj,yj ) with j = 1 to N to construct [A] matrix; (c) solving resulting 
linear system of Eq. (3); and (d) post-processing, aerodynamic calculations using Eq. (2). 
The experience shows that the Steps b and c takes most computational time, which is 
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partically true for three-dimensional computations 2 . 

4. PERFORMANCE 

As mentioned earlier, this work is based on that of Ref. [ 1 ] . Due to the inefficiency 
of the direct solver 1 , the resulting linear system is solved by an indirect, iterative method. 
The Jacobi mathod is employed, since the method is very much appropriate for parallel 
environment . List 1 and List 2 are the subroutines for Jacobi iterations m serial FORTRAN 
and parallel CM- FORTRAN versions, respectively. 

The serial code with different numbers of panels (N) is first executed on Gray-YMP su- 
percomputer using single processor to provide the basis for performance comparison. The 
computational performance in terms of MFLOPS is obtained using Cray-YMP s PERF 
TRACE utility. The parallel CM-FORTRAN code is then executed on CM-5 with 32 
nodes, 64 nodes and 128 nodes under slicewise model. 

Table 1 gives the detailed performance results for Cray-YMP and CM-5 computers 
with varying size of the problem. The performance of CM-5 in terms of MI’ LOI S is the 
equivalent Cray-YMP performance. In Table 1, “Mat Coef” refers to evaluating aero- 
dynamic enfluence coefficients; “Lin Syst” refers to solving linear system using Jacobi 
method; and “Total” refers to solving entire code. The sets of results from Table 1 have 
been extracted, and are presented in Figs. 1-4. 

Fig.l shows execution time for evaluating aerodynamic influence coefficient matrix, 
[A], and the matrix [B\ on Cray-YMP and CM-5 computers for different numbers of panels. 
It can be seen that the CPU execution time decreases with the increase of the number of 
CM-5 nodes after the size of the problem is large enough to fully use all nodes. For example 
when N - 1024, the CPU time of 0.21G seconds with 32 nodes is reduced to 0.114 seconds 
with 64 nodes, and then is further reduced to 0.061 seconds with 128 nodes. That is to 
say that whenever the number of nodes used is doubled, the CPU time is almost reduced 
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by a factor of 2 - a near-perfect parallelization. It is also seen that when the problem size 
is large enough the CPU time required on CM-5, even with 32 nodes, is significantly (note 
that L og ! 0 - ax i s is used for execution time !) less than that required on Cray- \ MP. 

Fig. 2 is the CPU time for solving linear system using Jacobi iterations. The results 
tell us that, when the N is large enough, CM-5 out-performs Cray- YMP and the Jacobi 
method is very efficient. 

Fig. 3 shows the total CPU time for solving entire problem. The results are self- 
explanatory. Fig. 4 is a partial reproduction of Fig. 3 for performance results on Ciay- 
YMP and CM-5 with 128 nodes, and it is represented in terms of MFLOPS. From this 
figure it is clearly seen that the CM-5 performs at about 2 GFLOPS when N = 1024. 
The speed achieved here is very encouraging, which is much faster than that achieved on 

Cray- YMP. 

5. CONCLUDING REMARKS 

A source panel method code is successfully implemented on the MPP computer CM-5 
using CM- FORTRAN language. The linear system is solved by the efficient iterative Jacobi 
method. The detailed performance results tire obtained and analysed. The paiallel CM- 
FORTRAN code achieves a very high performance and for most of the cases tested here it 
out-perforins Cray-YMP supercomputer. The highest speed achieved in this investigation 

is about 2 GFLOPS which is very encouraging. 
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SUBROUTINE JBINT (A, B) 
PARAMETER (N=128 , M=128 ) 


SOLVE AX=B USING APPROXIMATE JACOBI ITERATIONS 
SERIAL FORTRAN VERSION 


DIMENSION A(N,N) ,B(N) ,X(N, 100) 

INTEGER VAR 
MAXITER=50 
TOL=0. 001 
AX0=0 . 0 
XMAXDIF=0 . 0 
DO 1000 I = 1 , N 
X(I,1) = 0.0 
1000 CONTINUE 

K = 1 

— 66 K= K+ 1 

DO 200 I = 1 , N 
DO 300 J=1 , N 

VAR = K-l 

IF (J.EQ.I) GOTO 300 
AX0=A ( I , J) *X(J, VAR) + AXO 
300 CONTINUE 

X(I,K) = 1/A(I,I)*(B(I) -AXO) 

XDIF = ABS (X(I,K) — X ( I , K-l) ) 

IF (XDIF.GT.XMAXDIF) XMAXDIF=XDIF 
AX0=0 . 0 

200 CONTINUE 

IF (XMAXDIF.LT. TOL) THEN 
MAXK = K 
GOTO 99 
ENDIF 

XMAXDIF =0.0 

IF (K.LT. MAXITER) GOTO 66 

PRINT*,' NOT CONVERGENT YET AFTER ITERATIONS MAXITER 
RETURN 

__99 CONTINUE 

DO 400 1=1, N 
B ( I ) =X ( I , MAXK ) 

400 CONTINUE 

- RETURN 

END 


noon 


SUBROUTINE jbite 
PARAMETER (N=128 , M=12 8 ) 


SOLVE AX=B USING APPROXIMATE JACOBI ITERATIONS 
PARALLEL CM- FORTRAN VERSION 


DIMENSION A(N,N) ,B(N) ,X(N,100) ,axO(N) ,c(N) 

INTEGER VAR 
REAL XDIF100 (n) 

COMMON/ BLK2 /A 
COMMON/ BLK3/B 
maxiter = 50 
tol = 0.001 
x(: ,1) = 0.0 
k = 1 

66 k = k + l 
var = k - 1 

FORALL ( 1=1 : N) AXO(I)=DOTPRODUCT(A(I, : ) ,X( : ,VAR) ) -A(I, I) *X(I, VAR) 
forall (i=l:n) c(i)=a(i,i) 
x(:,k)=1.0/c *(b-axO) 

xdifl00(l:n) = abs(x(:,k) - x(:,k - 1)) 

XMAXDI F=MAXVAL (XDIF100) 

IF (xmaxdif .LT. tol) THEN 
maxk = k 
GOTO 99 
ENDIF 

xmaxdif = 0.0 

IF (k .LT. maxiter) GOTO 66 

PRINT *, / NOT CONVERGENT YET AFTER ITERATIONS maxiter 

RETURN 
99 CONTINUE 

b = x( : ,maxk) 

RETURN 

END 



Table 1. The detailed 


Task/Size(N) 


Task 

Mat Coef 
Lin Syst 
Total 
Mat Coef 
Lin Syst 
Total 
Mat Coef 
Lin Syst 
Total ~ 
Mat Coef 
Lin Syst 
Total 
Mat Coef 
Lin Syst 
Total 
Mat Coef 
Lin Syst 


N 

32 

32 

32 

64 

64 

64 

128 

128 

128 

256 

256 

256 

512 

512 

_ 512 _ 

1024 

1024 


Cray-YMP 


Time(s) 


[M FLOPS] 


0.0015 

0.0004 

0.0070 

0.0051 

0.0012 

0.0166 

0.0106 


144 

31 

33 
169 

34 
56 
177 


0.0040 


33 


0.0470 


0.0773 


0.0097 


0.1370 


0.3050 


0.0368 


0.4450 


1.2200 


79 

181 

31 

107 

183 
31 
131 

184 
31 
144 


Total 


1024 


0.1420 

1.6100 
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An integral equation method based on the full-potential equation for transonic 
flow calculations is presented. The full-potential equation is written in the moving 
frame of reference, in the form of the Poisson’s equation. The integral equation 
solution in terms of the velocity field is obtained by the Green’s theorem. The 
numerical solutions are obtained by a time-marching (if unsteady flows), iterative 
procedure. The computational examples presented in the present paper include 
steady and unsteady, two-dimensional (airfoil) and three-dimensional (wing) 
flows. The method of combining the integral equation solution with the finite- 
volume Euler solution is also presented. Through studying the method and 
computational examples, the capabilities and limitations of the transonic integral 
equation method are discussed. Finally, the need for further research is addressed. 

Key words: integral equation, field/boundary elements, full-potential equation, 
transonic flow. 


INTRODUCTION 

Starting in the 1970’s a great deal of progress has been 
made in solving transonic flow using the finite-differ- 
ence method (FDM) and finite-volume method 
(FVM). Although the FDM and FVM are successful 
in dealing with transonic flows, there are several draw- 
backs associated with these methods. In the FDM and 
FVM, fine grid points are needed over a large computa- 
tional domain. Moreover, there are major technical 
difficulties in generating suitable grids for complex 
three-dimensional aerodynamic configurations. 

On the other hand, the integral equation method (IEM, 
or called field-boundary element method, field-panel 
method) has several advantages over the FDM and 
FVM. The IEM involves the evaluation of integrals, 
which is more accurate and simpler than the FDM and 
FVM. The IEM automatically satisfies the far-field bound- 
ary conditions and hence only a small limited computa- 
tional domain is needed. The IEM does not suffer from 
the artificial viscosity effects, as compared to FDM and 
FVM for shock capturing in transonic flow computa- 
tions. Morevover, the generation of the three-dimensional 
grid (field-elements) for complex configuration is not diffi- 
cult in the IEM, since the surface fitted grid is not required. 

Engineering Analysis with Boundary Elements 0955-7997/93/S05.00 
CO 1993 Elsevier Science Publishers Ltd. 


Because of these advantages, it is highly desirable to 
fully develop the IEM to treat transonic flows. Integral 
equation methods for transonic flows have been devel- 
oped by several investigators. 1 8 The author and his 
co-workers have been devoted to the development of 
the IEM, for steady and unsteady transonic airfoil and 
wing flow computations, during the past several 
years. 9-15 In the present paper, the recent develop- 
ment, along with the computational examples, are pre- 
sented. Through studying the method and the 
numerous computational examples, the capabilities 
and limitations of the transonic integral equation 
method are discussed. Finally, the needs for further 
research are addressed. 


FORMULATION 
Full-potential equation 

In the space-fixed frame of reference, the continuity and 
momentum equations for unsteady, inviscid compres- 
sible flows with negligible body forces are given by 

§y + pV-V = 0 (1) 

pmmmum page bun* mot filmed 


101 


103 


Study of integral equation methods for transonic flow calculations 



I d * w 

jW|^T + V r • n„. = 0 on w(r, t) = 0 (16) 

and 

VC p = 0 on w(r,/) = 0 (17) 

where n is the surface unit normal vector; the subscripts 
g and w refer to the body (wing or airfoil) and wake 
surface of g(r) = 0 and w(r, /) = 0, respectively; AC p is 
the pressure jump across the surface; and the subscript 
sp refers to edge of separation. 

Integral equation solution 


* 4 Euler [16] 

O O Present 


Fig. 3 . C p distribution, NACA 0012 , A/ x = 0 72 , a = 0 °. 


integrals and the volume integrals become field surface 
integrals, the coordinates, z and are not used; and 
the coefficients of 1/4 tt are replaced by 1/2 tt. The last 
integral term in eqn (18) is used only for the shock-fit- 
ting solutions in steady two-dimensional flows. 


By using the Green’s theorem, the integral equation 
solution of eqn (8), in terms of the relative velocity 
field, is given by 


V r (*,*z,/)= -V o (0-n(0x(r-r,) 


1 

4n 


9g({,v, CO 


e d ds 



7* (6 *I,C0 x d 


ds 


[ f X d 

iJf di 


(18) 


1 

47T 


C(C Vt Ci 0 

y d 2 


e d d£dridC, 


1 

47T 


<7s(£ V, C> t) 


e d ds 


where q is the surface source distribution; 7 is the 
surface vorticity distribution; the subscript S refers to 
the shock surface; the index NW is the total number 
of wake surfaces, d.v is the infinitesimal surface area; 
the vector d is given by d = (* - £)i + (y - 77) j + 
(z - C)k; and t d is defined by t d = d/|d|. 

It should also be noticed that eqn (18) has been 
written for three-dimensional flows. For two-dimen- 
sional flows, the above surface integrals become line 


COMPUTATIONAL SCHEME 

A sketch of the IE computational domains is shown in 
^8* ^ for three-dimensional flows, due to the nature 
of the nonlinearity of the flow, the solutions are 
obtained through a time-marching (if unsteady 
flows), iterative procedure, where the compressibility 
(G,), unsteadiness (G 2 ), and the wake shape and 
its strength are updated through each iteration. The 
details of the solution procedure can be found in Refs 
9 and 14. 


NUMERICAL EXAMPLES 

The integral equation method has been applied to steady 
and unsteady, two-dimensional and three-dimensional 
transonic flows. The computational results, along with 
the experimental data and other computational results, 
are presented in the following sub-sections. 

Steady subsonic airfoil flow 

The computational results for a steady compressible 
shock-free flow at a high subsonic Mach number 
are presented here as the first numerical example. The 
purpose is to validate the IEM for nonlinear compressi- 
ble flows. The surface pressure distribution 910 is shown 
in Fig. 3, along with the finite-difference (FD) Euler 
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Fig. 7. Hybrid IE-FV computational domain for unsteady flows. 


may be required to narrow the shock region, and to 
predict the shock motion as accurately as possible. 


CAPABILITIES AND LIMITATIONS 

The steady and unsteady integral equation methods for 
nonlinear compressible flows have been developed. The 
methods have been applied to steady airfoil, unsteady 
airfoil and unsteady wing flows, with or without 
shocks. The comparison of the present solutions with 
experimental data and FD or FV solutions shows that 
the integral equation methods, based on the linear theo- 
rem, can handle nonlinear flow problems accurately. 
For transonic flows with shocks of weak to moderate 
strength, IEM predicts shocks correctly, with the excep- 
tion of slight underprediction of the shock strength (Fig. 
4). For unsteady flows, the motion of the shock agrees 
with that predicted by FV Euler solutions (see Refs 9 
and 11), and the predicted lifting coefficient agrees with 
one obtained by FV Euler computation (Fig. 5). 

The advantages of the small computational domain 
and coarse grid have been utilised in the present IEM. 
For airfoil flow computations, a computational domain 
of 2x1-5 airfoil chord length with 64x60 
field-elements has been used. The application of a 
smaller number of field-elements, with larger 
field-elements over the outer region inside the domain, 
is possible. For wing flow computations, a compu- 
tational domain of 2-3 x 0-75 x T5 wing root chord 
length with 23 x 9 x 9 field-elements has been used, 
although finer field-elements around the shock region 
may be required to accurately predict the shock loca- 
tion and its strength. 

The number of iterations used for steady flow compu- 
tations are approximately 25 for airfoil and five for wing 
flows, respectively. This is much less than those used in 
FD and FV computations (usually of the order of 10 3 ). 
For unsteady flows, the number of iterations used in 
each time step range from one to three. Large time steps 
have also been used in the present unsteady flow compu- 
tations. This is also one of the advantages of IEM. For a 
whole cycle of pitching oscillation, for example, a total 



Euler [19] 
C Present 


Fig. 8. Euler domain and C p distribution, a strong shock case, 
steady flow, M x = 0 84, a = 0. 


of 36 time steps have been used; while a typical implicit 
FD or FV computation needs about 500 time steps for 
the same case. Therefore, IEM is nevertheless efficient 
in terms of the number of interations and the time step 
size, as compared to existing FDM and FVM. 

By examining the numerical examples presented here 
it is found that, on the other hand, all these compu- 
tations are restricted to the flows with shocks of weak 
to moderate strength. As the best of the author's knowl- 
edge, all existing integral equation solutions, based on 
potential flow formulation for transonic flows, are 
restricted to flows without strong shocks. The potential 
flow assumption neglects the effects due to viscosity, 
vorticity and entropy production. For transonic flows 
with strong shocks and massive separation, the poten- 
tial flow assumption is not an adequate approximation 
to the real flow. 



Fig. 9. Lifting coefficients, ramp motion, M„ = 0 56, 
«(0 = -0-01° +0-855T. 
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1(c). Parallel Intel- FORTRAN version. 
List 1. Program listing for matrix calculation. 
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Figure 4 . Computational speeds in terms of MFLOPS. 
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Unsteady transonic wing flow computations using 
field-boundary element methods 

Hong Hu 

Department of Mathematics, Hampton University, Hampton, Virginia 23668, USA 


An unsteady integral equation (or called field-panel, field-boundary element) 
scheme for solving the full-potential equation for transonic unsteady wing flows 
has been developed. The unsteady full-potential equation has been written in a 
moving frame of reference, in the form of the Poisson’s equation. Compressibility 
and unsteadiness have been treated as non-homogeneity. The integral equation 
solution in terms of velocity field is obtained by the Green's theorem. The 
solution consists of a wing surface (boundary elements) integral term of vorticity 
distribution, a wake surface (boundary elements) integral term of free-vortex 
sheet and a volume (field-elements) integral term of compressibility and 
unsteadiness over a small limited domain around the wing. Numerical solutions 
are obtained by a time-marching, iterative procedure. Time-derivative term is 
calculated by a second-order backward finite-difference scheme. To be consistent 
with the mixed-nature of flows, the Murman-Cole type-difference scheme is used 
to compute the derivatives of the density. The present scheme is applied to flows 
around a rectangular wing at transonic speed undergoing acceleration motion and 
transient pitching motion, respectively. The time history of wing surface pressure 
distributions has been presented. 

Key words: unsteady transonic wing flows, full-potential equation, moving frame 
of reference, field-boundary elements, integral equation. 


NOTATION 

C p Surface pressure coefficient 

d Distance vector pointed from sender to receiver 

ds Infinitesimal surface area 

e d Unit vector of d 

g Wing surface 

G G\ + G 2 

G\ Compressibility 

G 2 Unsteadiness 

Mi Initial value of M a 

M a Translation Mach number 

M (} Rate of change of M 0 

n Surface normal unit vector 

oxyz Moving frame of reference 

OXYZ Space-fixed frame of reference 

r Position vector 

r p Pivot point vector 

t Time 

V Absolute velocity 


Engineering Analysis with Boundary Elements 0955-7997/92/S05.00 
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V e Rotation velocity 

V G Translation velocity 

V r Relative velocity 

w Wake surface 

q Angle of attack 

ot\ Initial value of a 

q Rate of change of a 

7 Vorticity 

k Gas specific heat ratio 

p Density 

$ Absolute velocity potential 

ft Angular velocity 

INTRODUCTION 

Starting in 1970, a great deal of progress has been made 
in solving transonic flows by using the finite-difference 
method (FDM) and finite-volume method (FVM). 
Although the Navier-Stokes equation formulation for 
the transonic flow computations has been understood as 
the best model and the FDM and FVM are successful in 
dealing with transonic flows, the computation of the 
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Unsteady transonic mngfio* computations using field-boundary dement methods 

integral term in eqn (12), decreases rapidly with 
increasing distance, d , not only because of t e ac or 
of 1 M 1 but also G(£,t),C, 0 diminishes rapidly with 
increasing distance. Consequently, for computation^ 
purposes, the volume integral term needs to be 
addressed only within the immediate vicinity of the 
body. This is the beauty of the IE methods. 


The associated boundary conditions are described in the 
next sub-section. 


Boundary conditions 

The boundary conditions are surface no-penetration 
condition, Kutta condition, infinity condition, wake 
kinematic and dynamic conditions. They are described 

as follows; 

on g(r ) = 0 


V r -n g = 0 


A C p \ sp = 0 


(7) 

( 8 ) 


Vd> — + 0 away from g(r ) — 0 and vv(r, t) 0 (9) 

1 d'w 


|Viv| dt 


' + V r -n K =0 on vv(r, 0 = 0 


ACp = 0 


(10) 

( 11 ) 


on vv(r, 0 = 0 

where n. is the unit normal vector of the wing surface 
„(r) = 0- C p is the surface pressure coefficient; the 
subscript sp refers to the edges of separation, and in t e 
present scheme the only separation from the wing 
trailing edge is considered; and w(r,t) = 0 is the wake 
surface. 

IE solution 

By using the Green’s theorem, the integral equation 
solution of eqn (2) in terms of the relattve veloctty field 

is given by 

Vr(x.ysZj) ■ 


COMPUTATIONAL SCHEME 
Discretisation 

A sketch of the computational model, with the relative 
size of the computational domain, is shown in big. • 
The wing and its wake are represented by triangular 
vortex panels. A uniform rectangular parallelepipe 
type of volume elements are used throughout the flow 
field. The discretised integral equation solution become 

K(x,y,z, t) 

= -^( 0-^(0 x ( F_ 

7* 


V„{t) - n(0 x ( F “ F />) 

. . . TvUfthCtO * d 

+ — I I 

7«Utfi,C,0 


+ 


ill. 
II. 


47T 

1 

4tt 


d 3 
Hi 

y d 2 


ds{i,T},0 

d.v(£,r?,C,0 


+ ± J[[ 


+ 




« LW SW r f 

iSSlL 


. IV MV NV 
H7r i=\ j=\ k = 1 


t)xd 

7 5 


d $((;,»?, C 0 


1 


v, .0 d 


j e d d£ d /7 dC 


(13) 

where the indices, / J and k refer to the surface panels 
and field elements; LG x NG is the total number of wing 
surface panels; LW x NW is the total number of wake 
surface panels; and LV x MV x NV is the total number 
of field elements. A constant ^-distribution is use ove 
the small field element, while a linear r d.stnbution is 
used over the small surface panel. 


(12) 

where 7 is the surface vorticity distribution, the 
subscripts * and w refer to the wing and wake 
surfaces, respectively; dr is the infinitesimal surface 
area- the vector d is given by _d-{x €)' 

T i)T+ (z - C)/c; and e d is defined by e d - d/\d\. 

In eqn (12), the first integral term is the contribution 
of the wing surface vorticity; the second inle 8 r ^ e ™^ 
the contribution of the wake vorticity; and the third 
integral term is the contribution of the full compressi- 
bility and the unsteadiness. It should be noticed t a 
infinity condition, eqn (9), is automatically satisfied by 
the integral equation solution. It should also be noticed 
that the integrand of the volume integral term, the third 
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Fig. 6. C, distribution at M () (t) = 0*8 over wing surface, 
acceleration motion. 

The time derivative term of the potential, (d'$/dt)^ n \ 
can be numerically calculated by and and 

hence and must be calculated by integration 

of the velocity field numerically. In order to avoid 
numerical error when doing this numerical integration, 
eqn (6) is used to compute (d'$/dt)^ distribution. 
Thus eqn (6) takes the form 

aW" 1 i — p ( "~' r ' 

dt ) k - 1 (17) 

-i v r (n)2 +\{V„ {n) + v e in) f 

With Gj"* obtained from eqn (16) and (d '$> / dt)"’ 
obtained from eqn (17), Step 1 is repeated until the 
solution converges. 

Step 3 - at time step (k — n+ 1 ) 

Step 2 is repeated for time step (n + 1). 



Fig. 7. Unsteady C p history at z = Ml, pitching motion. 



Fig. 8. Unsteady C p history at z = 4*44, pitching motion. 


NUMERICAL EXAMPLES 

The present scheme has been applied to a zero-thickness, 
rectangular wing with aspect ratio of two. The half-span 
of the wing and the wake is divided into 10x6 and 
10 x 10 quadrilateral panels, respectively. Each quadri- 
lateral panel consists of two triangular panels. The one- 
half of the computational domain is divided into 
23 x 9 x 9 field volume elements in x,y and z direc- 
tions, respectively. Two numerical examples are pre- 
sented as mentioned before. The first one is the 
acceleration motion and the second one is the pitching 
motion. 

Acceleration motion 

In this numerical example, the wing is given an 
acceleration motion at an angle of attack of 5 degrees. 



Fig. 9. Unsteady C p history at z = 7*78, pitching motion. 
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lion for further code conversion mid performance evaluation of 
coniples tliree-dimensional boundary element tluid d>namics compu- 
tational code. 

In the next section tlie basic aspects of the CM-2 macliine and CM- 
FORTRAN are brie tly reviewed, wliich is followed by the description 
ol die physical problem under tliis investigation in Section 3. the code 
conversion and performance evaluation are presented in Sections 4 
and 5, repec lively. Finally in Section 6, die concluding remarks are 
drawn. 


2. CM-2 and CM-FORTRAN 

The coiuiection machine CM-2 system is an integrated combination of 
hardware and software designed for high speed, large problem parallel 
computation. The hiirdvvare elements of the system include front-end 
computers, a SIMD (single instruction multiple data)parallel process- 
ing unit to execute the data parallel operations, and a lugh performance 
data parallel operations, and a high performance data parallel I/O 
system. The SIMD parallel processing unit is the heart of CM-2 
system, which contains up to 65536 single-bit physical processors 
(64k) in blocks ot Sk (k= 1024). The CM-2 used in die present work 
at NASA -Ames Research Centre lias 32k processors widi each 
processor operating at a clock speed of 6.7 Ml Iz. The aggregate peak 
performance for diis 32k CM-2 is about 2 GFLOPS. 

The 32k single-bit processors on dus 32k-CM2 are grouped in 
1 024 nodes of 32 processors each, Each node also has 64-bit Wcilck 
floating point co- processors, 4 MB of local memory, and hardware for 
inter processor communication. Parallel data structures arc spread 
across die data processors, with a single element stored ui each 
processors’ s memory If die number of the parallel data elements 
exceeds the total number of physical processors, die connection 
machine creates virtual processors by dividing die memory of each 
physical processor, i he ratio ol virtual to physical processors is known 
as die VP-ratio, R In general, tl outing-point performance (usually in 
terms of MFLOPS or GFLOPS) is maximum when Rvp is as large as 
possible, since die communication oveihead is reduced as Rvp 
increases, ihc detailed description about CM-2 computer is docu- 
mented in many places, such as References 1 and 2. 

The CM-FORTRAN laguage is an implementation of FORTRAN 
77 supplemented widi array-processing extent ions from the ANSI 
draft and ISO standard FORTRAN 90. dliese array-processing fea- 
tures map naturally onto die data parallel architecture of the CM-2 
system, since CM-FORTILAN allows array elements to be evaluated 
simultaneously. Relerence 3 explains these extensions, while Refer- 
ence 4 gives a full description of CM-FORTRAN. The most important 
difference between CM-FORTRAN and FORTRAN 77 is the treat- 
ment ot entire arrays us objects in CM-FORTRAN, thus explicit 
indexing in CM-FORTRAN is not always necessary. For example, tl 
is not necessary to WTite Do- Loops or other such control constructs to 
have die operation repealed for each element of arrays. 'Hicreforc, tliis 
teature maps the problem directly to the CM-2 with minimal program- 
ming effect. The further explanation ofCM-FORTKAN will be given 
along with the code conversion example in Section 4. 


where G represents full linear or non-linear eompressibilty and is a 
function of <1> in general. Tliis type of problem can be solved by 
boundary element methods (also called panel methods, integral 
equation mediods). The boundary element method is based on Green’ s 
theorem, which represents the solution in terms of integrals over body 
surfaces, separated surfaces and volume around die body. The surfaces 
and volume are then divided into a large number of elements, where 
integrals are evaluated. 

The boundary element solution in terms of velocity field (V — 
V4») is given by 5>6 


'''(*.!/>*) = Vc 
LG NG 


i=lJt=l J J S*.k 


LG NG 


hzzJl 


MS I.S NS 


lAZ.’lX) x d 


J 3 




( 2 ) 


+ ^S/L,. 

+hfzzJJL 

;=ii = u=i J J 


7w((>vX ) x d 

<f J 




& 


j d£dijd{ 


where q and 7 me the surface source and voiticity distribution 
respectively, which are unknowns to be determined by applying 
boundary conditions, the subscripts g and w refer to die body and 
separated surlacos* respectively, where MS is the total number of 
separations; dv is the infinitesimal surface urea; die vector cTis given 
by d=(x - O' +(l/ - J i)j T - C)^i anci C V is defined bv e d ~ 

d/\d\. 

It should be noted that equ (2) involves evaluating a large number 
of integrals (in fact dicy lue vectors also!) over U)dy surface (total of 
2 x LG x NG ), separated surface (total of MS X LS x NS Jund 
volume (total ot LV x MV x NV). The total number of elements 
can be very large for aerodynamic problems, mn\ for aircraft configu- 
rations it can be, fur example in die 01 derof 106. An important feature 
of equ (2) is dial the calculation I61 each (x,y,/) and cucl»(f , qX) can 
simultaneously for all (x,y,z) and (£, r;, () widi a single instruction. 

1 Iris teat ure ol boundary element calculation leads itself in a natural 
way for processing data in a SIMD parallel computing environment. 

For simplicity in the present work, an incompressible How (G=0) 
past a two-dimensional symmetric configuration at zero incidence is 
considered. At this simple flow condition, cqn (2) reduces to a much 
simpler form. 


v(x>y)= Vc 

N 


- JL sr f .. ,c n )( x - O* + (v - n)J . , 

2 * h l- 


( 3 ) 


By applying body surface zero-normal -velocity condition at each 
(x.y) - (jy/) for >“1 to j\\ here N~ LG , a NxN system of equations 
is obtained as 


3. Fluid Dynamics - BEM 

I he physical problems considered here are potential llows around any 
arbitrary complex configurations (let us call it ‘body’), including 
incompressible and compressible tlows with and without separations. 
1 he governing equation to diis lyjie of problem can be written 111 die 
form of Poisson’s equation given by 

V 2 <*> - G (0 


WW = [23] (4) 

where [A] isAxAinfluenee coefficient matrix, [r/] is an N x 1 unknown 
vector matrix containing q f Tory - 1 to N\ and [13 \ is an N x 1 known vector 
matrix which is contributed from in diis simple two-dimensional 
llow case. 

The solution procedure for the above two-dimensional problem 
using boundary element method involves four steps: (1) generation of 
body geometry information; (2) evaluation on integrals of cqn (3) tor 
/= 1 to N and for (x,y) = (x f y) with j= 1 to N to construct [A \ matrix , (3 ) 
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diflcrcnce between the pres P an d i£] matrix 

gctwtiil cwnpkK Hows govern^ yj- gcneraito-dintooiul 

«**“r “l iSSE^ shown « th. m - m »-ix 

=»S=?Wi 

ESrr^r^rr^ 

4. CM-FORTRAN implementation 

c i FORTRAN codes of two-dimensional source panel (boundary 
Serial FOR IRAN coues u ... available in some 

element) methods describe yoq ^ FORTRAN code in 

references, such as is [or Uu . C M-2 code present here. 

■^v^^^l^O^rR^^cale lias first modified tor efficient executing 
lliis senal rORTl C M-2 computer, The listing ol 

on NASA-Langley Research fu Appendix, where several 

the CM-l'ORTRAN ^ pertonmuice results. 

CM-Timmg routuies have bum ^ ‘ durc consists of four 

AldiongliUieto^^n ^ illtcgral s and Step 3 for 

steps discussed earlier, P computational ume. 

solving lmear the system li* « „ ukc m0 re than 95% 
Experience lias shown that these twerps us Y )k S)/c 

«** ,s •“ 

Theretore. the present inteiesi oi l computations, 

focused on Step 2 computations and then ^ 
although llie code is fully converted into CM-1 OR 1 RAM. 

SUBROUTINE MATELM 

xc, S .vcM, 

1 , UINF , VINF , SDE 

00 2 K“ 1 , N 

°° ^K J “eQ N J) FM(K,J)- 2 . 0 *M 
% Ik.'eq. j> FT(K,J)» 0.0 
if (K. EQ. J) coa '° 1 

DVJ-SI(J)*0S(J) 

DXJ-CI (J) *DS(J) 

SPH-DS(J)*0*5 

XD“XC(K)-XC(J) 

YQ-YC (K) “YC( J) 

KK J- SQRT ( XO * XD+ Y D * V D ) 

BKJ-ATAN2 (YD, XD) 

ALJ-ATAN2 (DYJ, DXJ) 

G K J •* ALJ ~ B K J 
2IK-RKJ*C0S (GKJ) 

ETK«*”'RKJ*i»IN (GKJ ) _« 

BlS-( (EXK+SPH) **2)+ETK*ETK 

r t S »(( 2IK-SPH) **2)+ElK*ETK 
QT-AL0C(R1S/R2S) 

DEN-ZIK*ZIKfETK*ETK-EPH*SPH 

CNM-ETK*DS(J) 

QN-2 . 0*ATAN2 (CHM, DEN 
UKJ“QT*CI ( J) -QN*SI ( J) 

wr«OT*SI ( J) +QN*CI (J) 

J1--UKJ*SI(K) + VKJ*CI(K) 

E-rtK* J)-UKJ*CI (K) +VKJ*SI (K) 

1 S?-UXHF. S X(K)-VXNF.CX(K) 

2 CONTINUE 
RETURN 
END 

Fig. 1(a) FORTRAN 77 subroutine for calculating matrices 


? jrrssr£-- 

is noted dial this subroutine is nothing but a two- c 


Do-Loop, which provides for iTvecloS^fthe 
I automatically done Uirough the vecton.ation capa- 
bility of the FORTRAN 77 compiler. 

SUBROUTINE MATELM 

SSSS'Jw : 5 m '■ * «) • 1 

1 , FT (N, N) ,RHS(H) . SDE W y D( n N) ,VD(N,N) 

2 DW (H.H) . DXJ (N, N . SPU N ,N •»<“$, ZIK(N, N) 

7 ',XC2 (N'.N) VC2 (r',N) XC3 («', N) , VC3 (N , N) 

8,SI3(N,N) # CI3(N»N) 

SSS^x^&! 5 S?S:™.”‘“ a -«* CPI>cx *“ 

H^f-'spKxCgDIM-l.RCOPIES-N) 

YC2 - SPREAD (VC, DIM-1, NCOPIES-N) 

vcl - SPREADtXC, DIM-2, NCOPIES-N 
VC3 - SPREAD (VC, DIM-2, NCOPIEj- ) 

«2-SPREAD (SI, DIM-1, NCOPIES-N 
Sex DIM-1 , NCOPIES-N) 

iaK!S:SK:S 5 S: 

WHERE (MAIN_DIAG) 

IN - 2.0 * PI 
FT - 0.0 
elsewhere 

DYJ - SI2 * 0S2 
DXJ - CI2 * DS2 
SPH - DS2 * 0.5 
XD - XC3 - XC2 
YD “ YC3 “ YC2 

RKJ-SQKT(XD*XD +YD*YD) 

BKJ“ATAN2 (YD, XD) 

ALJ=ATAN2 (DYJ , DXJ ) 

GKJ - ALJ _ BK J 
ZIK-RXJ*COS (CKJ) 

ETK»-RKJ*SIN (GKJ) P qvi/*ETK 

R1S«( (ZIK+SPH) + LaK *“™ 

R 2S— ( ( ZIK— SPH) * *2 ) + ETK*E1X 

DEN-ZIX* XIK^ +"* ETK * ETK - SPILSPH 

QN=2 E 0*ATAN2 (GNM, DEN) 

UKJ-QT*CI2-QN*SI 2 

VKJ“QT * S 1 2 +QN * Cl 2 
FN _UKJ^I^tVKJ* CI 3 

FT-UKJ “CI3+VKJ * i>I 3 
ENDWHERE 

RilS«UINF*Sl-VINF*CI 

RETURN 
END 

Fig. 1(b) CM-FORTRAN subroutine for calculating matnees 
Figure 1(b) is the list of 

evaluating coetliuent matrix, l- ’ , ls j x;cn seen here since 

tINU . .. i. Tivit-fi tiie CM-FORTRAN intrinsic iiuiclion Si u 

m- «k.« to yy»y.n o p jW* - ^ 

elimination method in Cray senal 1 OR 1 RAN version 




Table 1. The detailed computational performance results 


Task/SizcfN) 1 

8K-CM2 I 

1CK-CM2 | 

32 K- CM 2 f 

Cray-YMP | 

Task 

N 

Time(s) 

M FLO PS 

Time(s) 

M FLOPS 

Tirne(s) 

M FLOPS 

Time(s) 

M FLOPS 

Mat Coef 

32 

0.0173 

12.3 

0.0173 

12.3 

0.0173 

12.3 

0.00152^ 

140 

Lin Syst 

32 

0.125 

0.210 

0.123 

0.210 

0.123 

0.210 

0.00152 

17 

To t al 

32 

0.156 

1.64 

0.154 

1.66 

0.154 

1 .66 

0.00851 

1 — 

Mat Coef 

64 

0.0104 

44.5 

0.0194 

44.5 

0.0186 

46.4 

0.00523 

165 

Lin Syst 

64 

0.261 

0.741 

0.252 

0.767 

0.250 

0.774 

0.00879 

22 

Total 

64 

0.294 

3.78 

0.280 

3.85 

0.282 

3.90 

0.0250 

44 

Mat Coef 

128 

0.0310 

112 

0.0224 

155 

0.0203 

171 

0.0202 

172 

Lin Syst 

128 

0.717 

2.03 

0.590 

2.46 

0.534 

2.72 

0.0581 

25 

Total 

128 

6.704 

6.61 

0.627 

8.05 

0.569 

8.89 

0.101 

50 

Mat Coef 

256" 

0.0680 

205 

0.0447 

312 

0.0323 

432 

0.0793 

176 

Lin Syst 

" _ 25G” 

3.08 

3.77 

2.0U 

5.81 

1.45 

8.01 

0.415 

28 

Total 

266 

3.17 

8/24 

2.07 

12.6 

1.50 

17.4 

0.544 

48 

Mat Coef 

512 

0.205 

273 

0.115 

487 

0.0685 

818 

0.313 

179 

Lin Syst 

512 

18.7 

4.87 

10.4 

8.76 

6.18 

14.7 

3.14 

29 

Total 

512 

10.0 

7.89 

10.5 

14.3 

6.30 

23.8 

3.57 

42 

Mat Coef 

1024 

0.731 

308 

0.383 

587 

0.206 

1092 

1.25 

180 

Lin Syst 

1024 

130 

5.20 

71.3 

10.3 

37.7 

ioT 

24.4 

30 

Total 

1024 

140 

6.80 

71.7 

13.4 

38.0 

25/2 

25.9 

37 

Mat Coef 

2048 

2.84 

316 

1.43 

627 

0.737 

1216 

4.98 

180 

Lin Syst 

2048 

1036 

5.60 

546 

10,5 

277 

21 

192 

30 

Total 

| 21)48 | 1030 

6.48 

5-18 

12,3 

278 

24 

198 

34 


computational performance, this linear system is solved by callmg a 
modified Gauss-Jordan function routine from CM Scientific Software 
Library (CMSSL) when running in CM-2 computer. CMSSL is 
created lor data parallel architectures ru id is designed to handle 
concurrent applications. Although the Gauss-Jordan method requires 
about 50% more operations compared to the Gauss elimination 
method, the Gauss-Jordan method is more appropriate for data parallel 
computation. Another reason to use Gauss-Jordan ra liter the the Gauss 
elimination method is simply that only the Gauss-Jordan method is 
available from CMSSL. 


5. Performance analysis 

The serial code with different numbers of boundary elements (A/) is 
first executed on the Cray-YMP supercomputer using single processor 
to provide the basis for performance comparison. Tlic Cray- YMP used 
here lias 5 processors (CPUS) with 128 MW SRAM central memory. 
Each CPU is a register-to-register vector processor with peak perform- 
ance at 150-300mllops. The computational performance in terms of 
inllops is obtained using Cray-YMP’s PERFTRACE utility. Hie 
parallel CM-FORTRAN code with ditlerent numbers of boundary 
elements is then executed on CM-2 with 8k, 16k and 32k processors 
under a si ice wise model. 

Table 1 gives detailed performance results for Cray-YMP and CM- 
2 computers with a varying size of problem. The perlonnance ol CM- 
2 in terms of tnflops is the equivalent Cray-YMP performance. In 
Table 1 , ‘Mat Coef refers to the subroutine for evaluating matr ices; 
‘Lin Syst’ refers to the routme for sol vmf linear system, and ‘Total 
refers to the total compulation for entire code. Hie sets of results form 
Table 1 have been extracted to be presented in Figs 2-5. Each figure 
is discussed below. 



Fig. 2 Performance comparison for calculating matrices 

Figure 2 shows execution time for evaluating the influence coeffi- 
cient matrix, [A], and the matrix [£j obtained on Cray-YMP and CM- 
2 computers for different numbers of boundary elements. The CM-2 
performance at each number ofboundury elements, us shown in 1 able 
1 , is represented by solid points which are connected by dashed lines 
since these lines do not represent the actual variation of execution time 
between each point. It can be seen that the epu execution time 
decreases with the increase of the number of CM-2 processors after the 
size of the problem is large enough to fully use all processors, l'or 
example when /V=2Q48, the epu time of 2.84 seconds with 8k 
processors is reduced to 1.43 seconds with 16k processors, and then 
is further reduced to 0.737 seconds with 32k processors. 'Dial is to say 
that whenever the wavenumber of processors used is doubled, the 
CPU time is reduced by a factor of 2. It is also seen that when the 
problem size is large enough for the CPU time required onCM-3,even 
with 8k processors, is significantly (note dial Log l(t -axis is used lor 
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execution time!) less than that required on CM-2 with 32k processors 
isabout l/7odfthat required on Cray-YMP. Hus is a very encouraging 
result. 



Fig. 3 Performance comparison for solving linear system 

Figure 3 shoes the CPU time for a linear system solver. It should 
be mentioned tlial again before comparing performance, the Gauss- 
Jordan solver requires about 50% more operations compared to the 
Gauss elimination solver. Hie comparison shows tliat the epu time 
required on CM-2 with 32k processors approaches that required on 
Cruy-YMP with the increase of the problem si/e. For example, the 
CPU time required on CM-2 with 32k processors is 0. 1 25 seconds tor 
,V=32 which is much larger than lliat for Cruy-YMP ot 0.00152 
seconds; while tlus comparison becomes 277 seconds to 1 92 seconds 
when jV= 2048. Therfore it can be expected thet when the problem size 
becomes large enough the CM-2 with 32k processors will outperform 
Cruy-YMP for solving linear systems By comparing the CPU time on 
CM-2 with 8k, 16k and 32k processors it is believed tlial it 64 k 
processors are used, CM-2 will outperform Cray-YMP even at a not 
very large A/- value, for example at A f =5 12. 


Figure 4 shows the total epu time. It is found that the situation in 
this figure is similar to that of Fig.3., since in the present physical 
problem, the solution of the linear system takes most of the compu- 
tational time when N is large. But it is not true lor general three- 
dimensional complex flows where separation occurs and compressibilty 
is important, as governed by eqn (2). In such complex flow situations, 
the experience has shown that the evaluation of coefficient matrix, [A], 
and die matrix [B] usually take above 80% of the total computational 
time, 5 - 6 as mentioned earlier. 



Fig. 5 Computational speeds lor calculating matrices in terms of 
MFLOPS 

Figure 5 is a reproduction of Fig. 2 for performance results on Cray- 
YMP and CM-2 with 32k processors, and is represented in terms ol 
mflops. From this figure it is clearly seen that the CM-2 perlormcd at 
above 1.2 G FLOPS when N=2(MK. 'Plus speed is about to reach the 
machine’s aggregate peak speed of aboul2 gllops.The speed achcivcd 
here is one of the highest speeds achieved on CM-2 with 32k 
processors for lluid dynamics problems done by some investigators.*' 
12 

6. Concluding remarks 



A simple source panel (boundary element) code lias been sucesstully 
implemented on the massively parallel Connection Machine CM-2 
using CM-FORTRAN language, 'ILe detailed performance results are 
obtained and analysed. Some concluding remarks cun be drawn from 
diis investigation. First, the conversion of serial fortran code to parallel 
CM-FORTRAN code is straightforward with little difficulty. Second, 
CM-FORTRAN code achei ved a very high performance and for most 
of the cases tested here it outperformed or near pertormed the Cray- 
YMP supercomputer. The highest speed achieved in litis investigation 
is above 1 .2 GFLOPS which is very encouraging. Third, the boundary 
element method is more appropriate for data parallel processing 
compared to finite-difference or finite-volume methods. Fourth, fur- 
ther computational performance investigation can be made on real 
life general three-dimensional complex flow problems using bound- 
ary element methods where an even higher performance should be 
expected on CM-2. 
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Appendix 

• tll • list of CM-FORTRAN code lor two-dimensional 

r llK following is the hst oi wvi 
boundary element (panel) metliod. 

parameter ------ doOT 

SSUS VELOCIT, »» «E„»«ES 

c AKUITRARX 2D UOD * ESCim , T ioN 

c JJ RW,ETl 5 fc*M* IM XXIS LENGTH OF the ellipse 

i “kh 

c VN DUE TO UNIT -^“f^AHCEUTXAL TO PANEL K 

^ PJ, INDUCED VELOUR* " — 


DUE TO UNIT SOURCE^ jd poltlTS 

m HUMBER OF ttheHTS 

" HUMBER OF ELEMLHTS 

SI ESS KfiS™ 

n-r ss“/v“i“?«srpo.« s 

feV ISS'IS “ 

QE Surface of ellipse 

l.FTtM.MJ.RHStM^SDE J^Ijt.rhs.M.CPI.CI.SI 

COMMON X,*#*'-'* * 

l.UI«*;,VIUF,SDE iv0T(M ) ,WK(N) 

DI ^ N |m TIMER ctEaUB) 
g 3 £ CMlTIMERlET/HlTlB) 

UINF - l-° 

V1NF - 0.0 

fkn - o.°o 
U - 1.00 
NMAX » N 
NRliS - 1 

^ D.I4IS92CS 

CFl “ 2.0/PI 

®:3>5«;^S£ WITH *.I 5 . 'ELEMENTS' 

calculate"” coordimates of bodi ahd 

CONTROL POIMTL-Step 

:Ttt rrt TIMER CLEAR ( 0 ) 

CALL CM “ TIMEr “ STA 1 <T ( 0 ) 
r ,i r tj 0 DV(FHU|C) 

cM TIMER STOP(O) 

CALL CM _ TIMER>R1MT(U)_ 

CON LTKU^^TUe” t^TRt X ^EQU ATI ON 

“it SBSSb«8 

O^EL “-^ER-pRIHT ( 1 ) 

EsSSSSS!?! 

AA - FN 

«fl‘fflTlHE» STOP ( 9 ) 

cS cmItimeORIM^!! 

r ''lSiiS»srs»:crsdE 

= XS^ CALLED FROM_CMSBL 

C *riMER CLEAR ( 10 ) 

^ sst^r 

B all* cm^timer STOP (10) 

O^EL ™- x xmER _ PUINT ( 10 ) 

SS SiWcLEga 

CMJ. srjj 1 .; 

PRINT *> ",I a - B G,N,NRHS, 

s^Hfess--- 

ffiS^sassai 

«ls:S“5r: 
c ""ii'Sss^ssi! 

C ALL CM R TlMER l SrOP(2) 

Esisss”-’, 

SIS Sjrw:*" 1 " 

STOP 
END 
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SUBROUTINE JJODY(FMN,B) 
PARAMETER (N“32 , M-3 3 ) 


CALCULATES BODY AND CONTROL POINT COORDINATES 
FOR AND ELLIPSE WITH MINOR SEMI-AXIS, B 


DIMENSION X (M) , Y(M) ,XC(N) , YC(N) ,DS(N) ,FN(N,N) 
1,FT(N,N) , RHS (N) ,SDE(N) ,CI(N) ,SI(N) 

2 , SX (N) , SY ( N ) ,TH(M) , AT (M) 

COMMON X,y,XC # YC, DS, FN, FT, RHS , PI , CPI , Cl , SI 
1, UINF, VINE, SDE 
NHLFF - N/2 + 1 
NHH - NHLFF + 1 
AN “ NHLFF - 1 
DTH “ PI/AN 

FORALL(I“l : NHLFF) AI(I)-I-I 

TH ( 1 : NHLFF) « PI - AI ( 1 : NHLFF) * DTH 

X(l: NHLFF) - COS (TH ( 1 : NHLFF) ) 

Y ( 1 : NHLFF) « B * SIN (TH ( 1 : NHLFF) ) 

X (NHH : M) — X (NHLFF-I : 1 : -1) 

Y (NHH :M) “ -Y(NHLFF-1:1:-I) 

XC— (X ( 1 : N) +X ( 2 : N+l ) ) *0.5 
YC— (y(l:N)+Y(2:NH) ) *0.5 
SX-X (2:N+1)-X(1:N) 

SY— Y ( 2 : N+l ) -Y (1; N) 

DS-SQRT(SX*SX+SY*SY) 

CI-(X(2:N+1)-X(1:N) )/DS(l:N) 

SI— ( Y ( 2 : N+l) -Y ( 1 : N) ) / DS (1 : N) 

RETURN 

END 

SUBROUTINE MATELM 
PARAMETER(N-32 , M-3 3 ) 


DIMENSION X(M) ,Y(M) ,XC(N) , YC(N) ,DS(N) , FN (N , N) 

1 , FT ( N , N ) , RHS (N) , SDE (N) ,CI(N) ,SI(N) 

2 , QTS ( N ) ,QNS(H) , QNK ( N ) , QTK ( N ) ,UU£N) ,W(N) 

3 , SDE2 (N,N) , FTSDE2 ( N , N ) , FNSDE2 ( N , N ) ,DUM(N) 

4 , PP (N) , QEX (N) , CPN (N) 

COMMON X, Y, XC, YC, DS, FN, FT, RHS, PI , CPI , Cl , SI 
1, UINF, VINF, SDE 

SDE 2 -SPREAD (SDE, DIM-1, NCOPIES-N) 

FTSDE2-FT * SDE2 
FNSDE2-FN * SDE2 
QTS- SUM ( ARRAY-FTSDE2 , DIM-2 ) 

QNS-SUM ( ARRAY-FNSDE2 , DIM-2 ) 

QNK-QNS + VINF*CI -UINF*SI 
QTK “QTS + VINF*SI +UINF*CI 
UU-UINF - QNS * SI + QTS *CI 
W-VINF + QNS * Cl + QTS *SI 
PP-1. -UU*UU-W*W 
CPN--PP/SQRT(1.0-FMN*FMN) 

DUM—B*B*XC 

DUM— YC* YC+DUM*DUM 

QEX= (1.0 + B) *YC/SQUT (DUM) 

WRITE (6,1) N 
WRITE (0,2) -'XC 
WRITE (6,3) N 
WRITE (6,2) CPN 

1 FQRMAT(/, 2X, 'XC(I) , 1=1, ' , 15) 

2 FORMAT (2X, 8F10 . 3 ) 

3 FORMAT(/, 2X, 'CPN (I) , 1=1, ' , IS) 

RETURN 

END 


CALCULATES MATRIX ELEMENTS AND RHS. 


DIMENSION X (M) , Y(M) ,XC(N) , YC(N) , DS (N) , FN (N , N) 

1 , FT ( N , N ) , RHS ( N ) ,SDE(N) ,CI(N) ,SI(N) 

2 , DYJ (N, N) , DXJ ( N , N) , SPH(N,N) , XD(N,N) , YD ( N , N ) 

3 , RK J ( N , N ) , BKJ (N, N) , ALJ (N,M) , CKJ (N,N) ,ZIK(N,N) 

4 , ETK ( N , N ) , R1S { N, N) , R2S (N, N) , QT (N , N) , DEN (N, N) 

5 , CNM (N , N) , QN (N , N) ,UKJ(N,N) ,VKJ(N,N) 

6 , DS2 (N,N) ,CI2 (N, N) , SI2 (N,N) 

7 , XC2 ( N , N ) , YC2 (N, N) ,XC3 (N,N) , YC3 (H, N) 

8 , SI 3 (N,N) , Cl 3 (N , N) 

LOGICAL KAIN_DIAC(N,N) 

COMMON X, Y, XC f YC, DS, FN, FT, RHS, PI, CPI, Cl, SI 

1, UINF, VINF, SDE 

XC2 » SPREAD (XC, DIM-1, NCOPIES-N) 

YC2 - SPREAD (YC, DIM-1, NCOPIES-N) 

XC3 - SPREAD (XC, DIM-2 , NCOPIES-N) 

YC3 « SPREAD ( YC, DIM-2 , NCOPIES-N) 

SI 2 “SPREAD (SI , DIM-1 , NCOPIES-N) 

Cl 2 “SPREAD (Cl , DIM-1, NCOPIES-N) 

SI 3 —SPREAD (SI , DIM-2, NCOPIES-N) 

CI3-SPREAD (Cl , DIM-2 , NCOPIES-N) 

DS2-SPREAD (DS , DIM-1 , NCOPIES-N) 
MAIN_DlAC-DIACONAL { SPREAD ( .TRUE. , 1,N) , .FALSE.) 
WHERE (MAIN_DIAG ) 

FN - 2.0 * PI 
FT - 0.0 
ELSEWHERE 

DYJ - SI2 * DS2 
DXJ - CI2 * DS2 
SPH - DS2 * 0.5 
XD - XC3 - Xp2 
YD - YC3 - YC2 
RKJ“SQRT(XD*XD +YD*YD) 

BKJ -AT AN 2 ( YD , XD) 

ALJ-ATAJ{2 (DYJ, DXJ) 

CKJ-ALJ-BKJ 
ZIK-RKJ*COS (CKJ) 

ETK— RKJ*SIN(CKJ) 

R1S“( (ZIK+SPH) **2) + ETK* ETK 
R2S“( (ZIK-SPH) **2) + ETK* ETK 
QT“ALOG (R1S/R2S) 

DEN— ZIK*ZIK + ETK* ETK - SPH*SPH 

CNM=ETK*DS2 

QN-2 . 0* AT AN 2 (GNM, DEN) 

UKJ-QT*CI2-QN*SI2 
VKJ-QT*SI2+QN*CI2 
FN— UKJ*SI3+VKJ*CI3 
FT-UKJ*CI3tVKJ*SI3 
ENDWHERE 

RHS— UINF* SI-VINF*CI 

RETURN 

END 

SUBROUTINE SURVL ( B , FMN) 

PARAMETER ( N- 3 2 , M- 3 3 ) 


CALCULATES VELOCITIES AND PRESSURE AT CONTROL 
POINTS 
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